Skip to main content

System Design Interview Preparation 2025

🎯 Strategy to Crack 80-90% System Design Interviews​

This comprehensive guide covers Low-Level Design (LLD) and High-Level Design (HLD) topics that appear in 80-90% of system design interviews at FAANG and top tech companies. Master these patterns to excel in both junior and senior engineering roles.


πŸ“Š Coverage Overview​

CategoryTopicsPriorityTime to Master
LLD Fundamentals8πŸ”΄ Critical2 weeks
LLD Design Problems15πŸ”΄ Critical3 weeks
Design Patterns12🟑 High2 weeks
HLD Fundamentals10πŸ”΄ Critical2 weeks
HLD Design Problems20πŸ”΄ Critical4 weeks
System Components15🟑 High2 weeks
Scalability Patterns10πŸ”΄ Critical1 week
Databases & Storage8πŸ”΄ Critical1.5 weeks

Total Preparation Time: 12-16 weeks with consistent practice (2-3 hours/day)


πŸ—οΈ LOW-LEVEL DESIGN (LLD)​

Understanding LLD​

What is LLD?

  • Object-oriented design of individual components
  • Class diagrams, relationships, and interactions
  • Code-level implementation focus
  • SOLID principles and design patterns

When is LLD Asked?

  • Junior to Mid-level (SDE-1, SDE-2)
  • First rounds of interviews
  • Machine coding rounds
  • Some senior roles for specific companies

1️⃣ LLD Fundamentals (8 Topics) πŸ”΄β€‹

Must Master​

1. Object-Oriented Programming Principles​

Key Concepts:

  • Encapsulation
  • Abstraction
  • Inheritance
  • Polymorphism

Interview Focus:

  • When to use inheritance vs composition
  • Abstract classes vs interfaces
  • Access modifiers and their impact

Common Questions:

  • "Explain polymorphism with a real-world example"
  • "Why is composition preferred over inheritance?"
  • "How does encapsulation improve code maintainability?"

2. SOLID Principles πŸ”₯πŸ”₯πŸ”₯​

Most Important for Interviews:

S - Single Responsibility Principle

  • A class should have only one reason to change
  • Example: Separate UserService from EmailService

O - Open/Closed Principle

  • Open for extension, closed for modification
  • Use interfaces and abstract classes

L - Liskov Substitution Principle

  • Subtypes must be substitutable for base types
  • Important for inheritance hierarchies

I - Interface Segregation

  • Many specific interfaces better than one general
  • Don't force clients to depend on unused methods

D - Dependency Inversion

  • Depend on abstractions, not concretions
  • Use dependency injection

Interview Tips:

  • Always mention SOLID when discussing design
  • Give examples from previous projects
  • Show how it improves testability

3. UML Diagrams​

Must Know:

  • Class diagrams (relationships, multiplicity)
  • Sequence diagrams (interaction flows)
  • Use case diagrams (system boundaries)

Key Relationships:

  • Association (has-a)
  • Aggregation (weak has-a)
  • Composition (strong has-a)
  • Inheritance (is-a)
  • Dependency (uses-a)

Tools:

  • Draw.io
  • Lucidchart
  • PlantUML (for code-to-diagram)

4. Class Relationships​

Association: Teacher ←→ Student (bidirectional)
Aggregation: Department β—‡β†’ Employee (weak ownership)
Composition: House β—†β†’ Room (strong ownership)
Inheritance: Dog ──▷ Animal (is-a relationship)
Dependency: OrderService ‏ EmailService (uses)

Interview Questions:

  • "What's the difference between aggregation and composition?"
  • "When would you use composition over inheritance?"

5. Design Principles​

DRY (Don't Repeat Yourself)

  • Extract common code into reusable components
  • Use inheritance or composition

KISS (Keep It Simple, Stupid)

  • Simplest solution that works
  • Avoid over-engineering

YAGNI (You Aren't Gonna Need It)

  • Don't add functionality until needed
  • Avoid premature optimization

Law of Demeter

  • Only talk to immediate friends
  • Minimize coupling

6. Exception Handling & Error Management​

Best Practices:

  • Use specific exceptions
  • Don't catch generic exceptions
  • Clean up resources (try-with-resources)
  • Log appropriately

Interview Focus:

  • Checked vs unchecked exceptions
  • When to create custom exceptions
  • Error propagation strategies

7. Concurrency & Thread Safety​

Key Topics:

  • Synchronization
  • Race conditions
  • Deadlocks
  • Thread-safe collections
  • Immutability

Common Patterns:

  • Singleton with thread safety
  • Producer-Consumer pattern
  • Thread pools

8. Testing & Testability​

Principles:

  • Write testable code
  • Use dependency injection
  • Mock external dependencies
  • Unit tests vs integration tests

Interview Questions:

  • "How do you make your code testable?"
  • "What's the difference between mocking and stubbing?"

2️⃣ LLD Design Problems (15 Problems) πŸ”΄β€‹

Category A: Object-Oriented Design (Must-Do)​

1. Parking Lot System πŸ”₯πŸ”₯πŸ”₯​

Difficulty: Medium | Frequency: Very High

Requirements:

  • Multiple floors with parking spots
  • Different vehicle types (car, truck, motorcycle, electric)
  • Different spot types (compact, large, handicapped, electric)
  • Entry/exit with ticket
  • Pricing strategy
  • Find available spots
  • Spot reservation

Key Classes:

ParkingLot, Floor, ParkingSpot, Vehicle
Ticket, Payment, PricingStrategy
VehicleType (enum), SpotType (enum)

Important Concepts:

  • Strategy pattern (pricing)
  • Factory pattern (vehicle/spot creation)
  • Singleton (ParkingLot)
  • Observer pattern (availability notifications)

Interview Focus:

  • How to handle concurrent requests?
  • How to find nearest available spot?
  • Database schema design
  • Extend for electric vehicle charging

Common Follow-ups:

  • "How would you handle peak hours?"
  • "Design a reservation system"
  • "Add a payment gateway"
  • "Handle handicapped spot priority"

2. Library Management System πŸ”₯πŸ”₯​

Difficulty: Medium | Frequency: High

Requirements:

  • Add/remove books
  • Search books (title, author, ISBN)
  • Issue/return books
  • Multiple copies of same book
  • Member management
  • Late fee calculation
  • Reservation system

Key Classes:

Library, Book, BookItem, Member
Librarian, Catalog, Search
Lending, Reservation, Fine

Important Concepts:

  • Strategy pattern (search strategies)
  • Observer pattern (availability notifications)
  • State pattern (book states: available, issued, reserved)

Interview Focus:

  • How to handle multiple copies?
  • Search optimization
  • Late fee calculation
  • Extend for ebooks

3. Hotel Management System πŸ”₯πŸ”₯​

Difficulty: Medium | Frequency: High

Requirements:

  • Room booking
  • Different room types
  • Search available rooms
  • Booking cancellation
  • Guest management
  • Housekeeping management
  • Room service

Key Classes:

Hotel, Room, RoomType, Booking
Guest, Receptionist, Housekeeper
RoomService, Payment

Important Concepts:

  • State pattern (room states)
  • Factory pattern (room creation)
  • Strategy pattern (pricing)
  • Observer pattern (housekeeping alerts)

Interview Focus:

  • Handling concurrent bookings
  • Overbooking strategy
  • Dynamic pricing
  • Integration with payment systems

4. Elevator System πŸ”₯πŸ”₯πŸ”₯​

Difficulty: Hard | Frequency: Very High

Requirements:

  • Multiple elevators
  • Up/down buttons on each floor
  • Destination buttons inside elevator
  • Optimal elevator selection
  • Emergency stop
  • Weight limit
  • Door open/close

Key Classes:

ElevatorSystem, Elevator, Floor
Button, Request, Direction (enum)
ElevatorController, Scheduler

Important Concepts:

  • Strategy pattern (scheduling algorithm)
  • State pattern (elevator states)
  • Command pattern (requests)
  • Observer pattern (floor updates)

Scheduling Algorithms:

  • FCFS (First Come First Serve)
  • SCAN (elevator algorithm)
  • LOOK algorithm
  • Destination dispatch

Interview Focus:

  • Optimal scheduling algorithm
  • Handle multiple requests
  • Emergency scenarios
  • Energy optimization

Common Follow-ups:

  • "How would you optimize for peak hours?"
  • "Design for high-rise buildings"
  • "Add priority for emergency services"

5. ATM System πŸ”₯πŸ”₯​

Difficulty: Medium | Frequency: High

Requirements:

  • Cash withdrawal
  • Balance inquiry
  • PIN verification
  • Cash deposit
  • Mini statement
  • Card reader
  • Cash dispenser

Key Classes:

ATM, Card, Account, Bank
Transaction, CashDispenser
CardReader, Screen, Keypad

Important Concepts:

  • State pattern (ATM states)
  • Chain of Responsibility (cash dispensing)
  • Proxy pattern (bank connection)
  • Command pattern (transactions)

Interview Focus:

  • Security considerations
  • Handling insufficient cash
  • Network failures
  • Concurrent withdrawals

6. Online Shopping System (E-commerce) πŸ”₯πŸ”₯πŸ”₯​

Difficulty: Medium-Hard | Frequency: Very High

Requirements:

  • Product catalog
  • Shopping cart
  • Order management
  • Payment processing
  • Inventory management
  • User accounts
  • Search and filter
  • Notifications

Key Classes:

Product, Category, ShoppingCart
Order, OrderItem, Payment
User, Seller, Admin
Inventory, Notification

Important Concepts:

  • Strategy pattern (payment, shipping)
  • Observer pattern (inventory, notifications)
  • Factory pattern (product types)
  • Decorator pattern (product customization)

Interview Focus:

  • Handling cart abandonment
  • Inventory synchronization
  • Concurrent purchases
  • Payment gateway integration

7. Car Rental System πŸ”₯πŸ”₯​

Difficulty: Medium | Frequency: High

Requirements:

  • Search available vehicles
  • Reserve vehicles
  • Rental process
  • Return process
  • Calculate charges
  • Late fees
  • Vehicle maintenance
  • Multiple locations

Key Classes:

Vehicle, Reservation, Branch
Customer, RentalTransaction
VehicleType, Insurance

Important Concepts:

  • State pattern (vehicle states)
  • Strategy pattern (pricing)
  • Factory pattern (vehicle types)

Interview Focus:

  • Handling overlapping reservations
  • Dynamic pricing
  • Maintenance scheduling
  • Multi-location management

8. Movie Ticket Booking System πŸ”₯πŸ”₯πŸ”₯​

Difficulty: Medium | Frequency: Very High

Requirements:

  • List movies and showtimes
  • Select seats
  • Book tickets
  • Payment processing
  • Cancellation
  • Multiple cinema halls
  • Different pricing (weekday/weekend)
  • Food ordering

Key Classes:

Movie, Show, Theater, Hall
Seat, Booking, Payment
Customer, Admin

Important Concepts:

  • Strategy pattern (pricing)
  • State pattern (seat states)
  • Observer pattern (seat availability)
  • Factory pattern (ticket types)

Interview Focus:

  • Concurrent seat booking (locking mechanism)
  • Seat selection UI/UX
  • Cancellation policy
  • Dynamic pricing

Common Follow-ups:

  • "How to handle seat blocking during booking?"
  • "Design for multiple cinema chains"
  • "Add recommendation system"

Category B: Design Patterns Implementation (Important)​

9. Vending Machine πŸ”₯πŸ”₯​

Difficulty: Medium | Frequency: High

Requirements:

  • Select product
  • Insert money (coins/notes)
  • Dispense product
  • Return change
  • Handle insufficient money
  • Product inventory

Key Classes:

VendingMachine, Product, Inventory
State (Idle, HasMoney, Dispensing)
Coin, Note

Important Concepts:

  • State pattern (machine states) πŸ”₯
  • Strategy pattern (payment)
  • Singleton (machine instance)

States:

  1. Idle
  2. HasMoney
  3. Dispensing
  4. OutOfStock

Interview Focus:

  • State transitions
  • Change calculation
  • Concurrent access
  • Inventory management

10. Chess Game πŸ”₯πŸ”₯​

Difficulty: Hard | Frequency: Medium

Requirements:

  • Valid moves for each piece
  • Check and checkmate detection
  • Castling, en passant
  • Pawn promotion
  • Game state management
  • Move history

Key Classes:

Game, Board, Square, Piece
Player, Move, GameState
King, Queen, Rook, Bishop, Knight, Pawn

Important Concepts:

  • Strategy pattern (piece moves)
  • Command pattern (moves)
  • Memento pattern (undo)
  • State pattern (game states)

Interview Focus:

  • Valid move calculation
  • Check detection algorithm
  • AI opponent (optional)

11. Snake & Ladder Game πŸ”₯​

Difficulty: Easy-Medium | Frequency: Medium

Requirements:

  • Board with 100 cells
  • Snakes and ladders
  • Multiple players
  • Dice roll
  • Win condition
  • Game state

Key Classes:

Game, Board, Player, Dice
Snake, Ladder, Cell

Important Concepts:

  • Strategy pattern (dice roll)
  • Observer pattern (player position updates)

12. Notification Service πŸ”₯πŸ”₯​

Difficulty: Medium | Frequency: High

Requirements:

  • Multiple channels (Email, SMS, Push)
  • Priority levels
  • Retry mechanism
  • Template management
  • User preferences
  • Delivery status

Key Classes:

Notification, NotificationService
EmailChannel, SMSChannel, PushChannel
Template, UserPreference
DeliveryStatus

Important Concepts:

  • Strategy pattern (channels)
  • Observer pattern (status updates)
  • Factory pattern (channel creation)
  • Template method (notification sending)
  • Chain of Responsibility (retry logic)

Interview Focus:

  • Handle failures gracefully
  • Rate limiting
  • User preference management
  • Scale to millions of notifications

Category C: Real-World Applications (Nice to Have)​

13. Logging Framework πŸ”₯πŸ”₯​

Difficulty: Medium | Frequency: Medium

Requirements:

  • Multiple log levels (DEBUG, INFO, WARN, ERROR)
  • Multiple output targets (console, file, database)
  • Log formatting
  • Log rotation
  • Configuration
  • Async logging

Key Classes:

Logger, LogLevel, LogAppender
ConsoleAppender, FileAppender
LogFormatter, Configuration

Important Concepts:

  • Singleton (Logger instance)
  • Strategy pattern (appenders)
  • Builder pattern (log configuration)
  • Chain of Responsibility (log levels)
  • Observer pattern (multiple appenders)

14. Cache System (LRU Cache) πŸ”₯πŸ”₯πŸ”₯​

Difficulty: Medium | Frequency: Very High

Requirements:

  • Get and Put in O(1)
  • Evict least recently used
  • Capacity limit
  • Thread safety (optional)
  • TTL support (optional)

Key Classes:

Cache, CacheEntry
DoublyLinkedList, HashMap
EvictionPolicy

Important Concepts:

  • Strategy pattern (eviction policies)
  • Singleton (cache instance)

Eviction Policies:

  • LRU (Least Recently Used)
  • LFU (Least Frequently Used)
  • FIFO (First In First Out)
  • Random

Interview Focus:

  • HashMap + Doubly Linked List implementation
  • Thread safety with ReadWriteLock
  • Generics for type safety
  • Memory management

15. Meeting Scheduler πŸ”₯πŸ”₯​

Difficulty: Medium | Frequency: High

Requirements:

  • Check availability
  • Book meeting rooms
  • Invite participants
  • Handle conflicts
  • Recurring meetings
  • Cancellation

Key Classes:

MeetingRoom, Meeting, Participant
Calendar, TimeSlot, Booking
Scheduler

Important Concepts:

  • Strategy pattern (conflict resolution)
  • Observer pattern (participant notifications)
  • Factory pattern (meeting types)

Interview Focus:

  • Interval overlap detection
  • Optimal room allocation
  • Handle time zones
  • Recurring meetings logic

3️⃣ Design Patterns (12 Patterns) πŸŸ‘β€‹

Creational Patterns​

1. Singleton Pattern πŸ”₯πŸ”₯πŸ”₯​

Use Cases: Database connection, Logger, Configuration manager

Thread-Safe Implementation:

public class Singleton {
private static volatile Singleton instance;

private Singleton() {}

public static Singleton getInstance() {
if (instance == null) {
synchronized (Singleton.class) {
if (instance == null) {
instance = new Singleton();
}
}
}
return instance;
}
}

Interview Questions:

  • Why double-checked locking?
  • Why volatile keyword?
  • Bill Pugh Singleton (Inner class)

2. Factory Pattern πŸ”₯πŸ”₯​

Use Cases: Creating objects without specifying exact class

When to Use:

  • Vehicle creation (Car, Truck, Motorcycle)
  • Payment method (Credit, Debit, UPI)
  • Notification channel (Email, SMS, Push)

3. Abstract Factory Pattern πŸ”₯​

Use Cases: Creating families of related objects

Example: UI components for different OS (Windows, Mac, Linux)


4. Builder Pattern πŸ”₯πŸ”₯​

Use Cases: Complex object construction

Example: Building a complex query, HTTP request, Pizza order

When to Use:

  • Many constructor parameters
  • Optional parameters
  • Immutable objects

5. Prototype Pattern πŸ”₯​

Use Cases: Cloning objects instead of creating new

Example: Document templates, Game characters


Structural Patterns​

6. Adapter Pattern πŸ”₯πŸ”₯​

Use Cases: Making incompatible interfaces work together

Example:

  • Legacy system integration
  • Third-party library integration
  • XML to JSON converter

7. Decorator Pattern πŸ”₯πŸ”₯​

Use Cases: Adding behavior dynamically

Example:

  • Pizza toppings (base + cheese + olives)
  • Coffee add-ons (coffee + milk + sugar)
  • Stream decorators (BufferedInputStream)

8. Proxy Pattern πŸ”₯​

Use Cases: Controlling access to objects

Types:

  • Virtual Proxy (lazy loading)
  • Protection Proxy (access control)
  • Remote Proxy (remote objects)

Example: Image lazy loading, Database connection pooling


Behavioral Patterns​

9. Strategy Pattern πŸ”₯πŸ”₯πŸ”₯​

Use Cases: Selecting algorithm at runtime

Examples:

  • Payment methods (Credit, Debit, UPI, Wallet)
  • Sorting strategies (QuickSort, MergeSort)
  • Pricing strategies (Regular, Holiday, Member)
  • Compression algorithms (ZIP, RAR, 7Z)

Most Important for Interviews!


10. Observer Pattern πŸ”₯πŸ”₯πŸ”₯​

Use Cases: One-to-many dependency

Examples:

  • Event listeners
  • Stock price updates
  • Notification system
  • MVC architecture

Implementation: Subject and Observer interfaces


11. State Pattern πŸ”₯πŸ”₯​

Use Cases: Object behavior changes with state

Examples:

  • Vending machine states
  • Order states (Pending, Processing, Shipped, Delivered)
  • Traffic light states
  • Connection states

12. Command Pattern πŸ”₯​

Use Cases: Encapsulating requests as objects

Examples:

  • Undo/Redo functionality
  • Task scheduling
  • Remote control operations

πŸ›οΈ HIGH-LEVEL DESIGN (HLD)​

Understanding HLD​

What is HLD?

  • System architecture at a high level
  • Component interactions
  • Scalability and reliability
  • Trade-offs and constraints

When is HLD Asked?

  • Mid to Senior level (SDE-2, SDE-3, Staff)
  • Final rounds of interviews
  • Architect roles
  • Leadership positions

4️⃣ HLD Fundamentals (10 Topics) πŸ”΄β€‹

1. System Design Framework (RESHADED) πŸ”₯πŸ”₯πŸ”₯​

R - Requirements (Functional & Non-Functional)

  • What does the system do?
  • Who are the users?
  • Scale expectations?

E - Estimations (Back-of-envelope)

  • QPS (Queries Per Second)
  • Storage requirements
  • Bandwidth
  • Memory

S - System Interface (API Design)

  • REST endpoints
  • Parameters and responses
  • Authentication

H - High-level Design (Architecture)

  • Draw initial architecture
  • Identify components
  • Data flow

A - Detailed Design

  • Deep dive into core components
  • Algorithms and data structures
  • Database schema

D - Database Design

  • SQL vs NoSQL
  • Schema design
  • Partitioning strategy

E - Scalability & Bottlenecks

  • Identify bottlenecks
  • Scale each component
  • Trade-offs

D - Deep Dives

  • Specific challenging aspects
  • Edge cases
  • Failure scenarios

2. Scalability Principles πŸ”₯πŸ”₯πŸ”₯​

Vertical Scaling (Scale Up)

  • Add more CPU, RAM, Disk
  • Limitations: Hardware limits, downtime
  • When to use: Quick fix, monolithic apps

Horizontal Scaling (Scale Out)

  • Add more machines
  • Benefits: No single point of failure
  • Challenges: Data consistency, session management

Key Concepts:

  • Stateless services
  • Load balancing
  • Caching layers
  • Database replication
  • Microservices

3. Load Balancing πŸ”₯πŸ”₯πŸ”₯​

Purpose:

  • Distribute traffic across servers
  • Health checks
  • SSL termination

Algorithms:

  • Round Robin
  • Least Connections
  • Weighted Round Robin
  • IP Hash
  • Least Response Time

Types:

  • L4 (Transport layer) - Fast, TCP/UDP
  • L7 (Application layer) - Smart, HTTP/HTTPS

Popular Solutions:

  • NGINX
  • HAProxy
  • AWS ELB/ALB
  • Azure Load Balancer

4. Caching πŸ”₯πŸ”₯πŸ”₯​

Cache Levels:

  1. Browser cache
  2. CDN cache
  3. Application cache (Redis, Memcached)
  4. Database cache

Cache Strategies:

Read Strategies:

  • Cache Aside (Lazy Loading)
  • Read Through

Write Strategies:

  • Write Through (write to cache + DB)
  • Write Back (write to cache, async to DB)
  • Write Around (write to DB, invalidate cache)

Eviction Policies:

  • LRU (Least Recently Used)
  • LFU (Least Frequently Used)
  • FIFO
  • TTL (Time To Live)

Cache Invalidation:

  • Time-based (TTL)
  • Event-based
  • Manual purge

Popular Tools:

  • Redis
  • Memcached
  • Varnish

5. Database Design πŸ”₯πŸ”₯πŸ”₯​

SQL vs NoSQL Decision Tree:

Use SQL When:

  • ACID transactions required
  • Complex queries with JOINs
  • Structured data
  • Consistency over availability
  • Examples: Banking, E-commerce orders

Use NoSQL When:

  • High write throughput
  • Flexible schema
  • Horizontal scaling
  • Availability over consistency
  • Examples: Social media feeds, Logging

NoSQL Types:

  1. Document DB: MongoDB, CouchDB

    • Use: User profiles, product catalogs
  2. Key-Value: Redis, DynamoDB

    • Use: Session storage, caching
  3. Column-Family: Cassandra, HBase

    • Use: Time-series data, analytics
  4. Graph DB: Neo4j, Amazon Neptune

    • Use: Social networks, recommendation engines

Database Scaling:

Read Scaling:

  • Read replicas
  • Master-Slave replication
  • Database caching

Write Scaling:

  • Sharding (horizontal partitioning)
  • Partitioning strategies:
    • Range-based
    • Hash-based
    • Directory-based

Replication:

  • Master-Slave
  • Master-Master
  • Quorum-based

6. Message Queues πŸ”₯πŸ”₯​

Purpose:

  • Asynchronous communication
  • Decouple services
  • Rate limiting
  • Retry logic

Patterns:

  • Producer-Consumer
  • Pub-Sub
  • Request-Reply

Use Cases:

  • Email notifications
  • Image processing
  • Order processing
  • Log aggregation

Popular Tools:

  • Apache Kafka (high throughput, streaming)
  • RabbitMQ (flexible routing)
  • AWS SQS (managed)
  • Redis Pub-Sub (lightweight)

Kafka Deep Dive:

  • Topics and partitions
  • Consumer groups
  • Offset management
  • Retention policies

7. Microservices Architecture πŸ”₯πŸ”₯​

Benefits:

  • Independent deployment
  • Technology diversity
  • Scalability
  • Fault isolation

Challenges:

  • Network latency
  • Data consistency
  • Debugging complexity
  • Testing

Key Patterns:

  • API Gateway
  • Service Discovery (Consul, Eureka)
  • Circuit Breaker (Hystrix)
  • Saga pattern (distributed transactions)

Communication:

  • Synchronous: REST, gRPC
  • Asynchronous: Message queues, Event streams

8. API Design πŸ”₯πŸ”₯​

REST Principles:

  • Stateless
  • Resource-based URLs
  • HTTP methods (GET, POST, PUT, DELETE)
  • HTTP status codes
  • HATEOAS

Best Practices:

  • Versioning (/api/v1/)
  • Pagination
  • Rate limiting
  • Authentication (JWT, OAuth)
  • Error handling

API Gateway:

  • Single entry point
  • Authentication
  • Rate limiting
  • Request routing
  • Response aggregation

GraphQL vs REST:

  • GraphQL: Flexible queries, single endpoint
  • REST: Cacheable, well-established

9. CAP Theorem πŸ”₯πŸ”₯​

Three Properties:

  • Consistency: All nodes see same data
  • Availability: Every request gets a response
  • Partition Tolerance: System works despite network failures

Reality: Can only choose 2 out of 3

Examples:

  • CP: Banking systems (Consistency + Partition tolerance)
  • AP: Social media feeds (Availability + Partition tolerance)
  • CA: Single-node database (not distributed)

PACELC Theorem:

  • Extension of CAP
  • If Partition, choose A or C
  • Else (no partition), choose Latency or Consistency

10. Consistency Patterns πŸ”₯πŸ”₯​

Strong Consistency:

  • All reads return latest write
  • Example: Banking transactions
  • Achieved: Single-leader replication, Paxos/Raft

Eventual Consistency:

  • Reads may return stale data temporarily
  • Example: Social media likes, DNS
  • Achieved: Multi-leader, Leaderless replication

Consistency Models:

  1. Linearizability (strongest)
  2. Sequential Consistency
  3. Causal Consistency
  4. Eventual Consistency (weakest)

5️⃣ HLD Design Problems (20 Problems) πŸ”΄β€‹

Category A: Social Media & Content (Must-Do)​

1. Design Twitter / X πŸ”₯πŸ”₯πŸ”₯​

Difficulty: Hard | Frequency: Very High

Functional Requirements:

  • Post tweets (140/280 characters)
  • Follow/unfollow users
  • Timeline (home feed)
  • Like, retweet, reply
  • Trending topics
  • Search tweets

Non-Functional Requirements:

  • 200M DAU
  • High availability (99.99%)
  • Low latency for reads (<100ms)
  • Eventual consistency acceptable

Key Components:

API Gateway β†’ Application Servers
Tweet Service, Timeline Service, Follow Service
User Service, Notification Service
Redis Cache, PostgreSQL/Cassandra
S3 for media, CDN
Kafka for async processing

Database Design:

Users: user_id, username, bio, followers_count
Tweets: tweet_id, user_id, content, created_at
Follows: follower_id, followee_id, created_at
Likes: user_id, tweet_id

Timeline Generation:

  • Fan-out on Write: Pre-compute timelines, fast reads
    • Push model: Write to all followers' timelines
    • Good for users with few followers
  • Fan-out on Read: Compute on demand, slow reads
    • Pull model: Fetch tweets on read
    • Good for celebrities with millions of followers
  • Hybrid: Fan-out for normal users, pull for celebrities

Scalability:

  • Shard by user_id or tweet_id
  • Cache timelines in Redis
  • CDN for media files
  • Read replicas for followers count

Interview Focus:

  • Timeline generation algorithm
  • Handle celebrity problem (Bieber problem)
  • Trending topics algorithm
  • Real-time updates (WebSockets)

Common Follow-ups:

  • "How would you implement trending topics?"
  • "Design the search feature"
  • "Handle viral tweets"
  • "Design analytics for tweets"

2. Design Instagram πŸ”₯πŸ”₯πŸ”₯​

Difficulty: Hard | Frequency: Very High

Functional Requirements:

  • Upload/view photos and videos
  • Follow users
  • News feed
  • Like, comment
  • Stories (24-hour ephemeral)
  • Direct messaging

Non-Functional Requirements:

  • 500M DAU
  • Low latency for image loading
  • High storage (petabytes of images)
  • Reliable uploads

Key Components:

Image Upload Service
Feed Generation Service
User Service
CDN (Cloudflare, Akamai)
S3/Blob Storage
Redis Cache
PostgreSQL + Cassandra

Image Storage:

  • Original images in S3
  • Multiple sizes (thumbnail, medium, full)
  • CDN for fast delivery
  • Pre-signed URLs for uploads

Feed Ranking:

  • Chronological (early Instagram)
  • ML-based ranking (current)
    • User engagement history
    • Post recency
    • Relationship strength
    • Post type (photo, video, reel)

Stories:

  • Ephemeral storage (24 hours)
  • Separate storage system
  • Ring buffer for efficiency

Scalability:

  • Geo-distributed CDNs
  • Image sharding by user_id
  • Separate read/write databases
  • Cache frequently accessed feeds

Interview Focus:

  • Image upload optimization
  • Feed ranking algorithm
  • Stories implementation
  • Handle high read:write ratio (100:1)

3. Design YouTube / Netflix πŸ”₯πŸ”₯πŸ”₯​

Difficulty: Hard | Frequency: Very High

Functional Requirements:

  • Upload videos
  • Stream videos (adaptive bitrate)
  • Search videos
  • Recommendations
  • Comments, likes
  • Subscriptions

Non-Functional Requirements:

  • 2B+ users
  • High bandwidth
  • Low latency streaming
  • 99.9% availability
  • Support multiple resolutions (360p to 4K)

Key Components:

Video Upload Service β†’ Transcoding Service
Video Streaming Service (HLS/DASH)
CDN (Akamai, Cloudflare)
Recommendation Engine
Search Service (Elasticsearch)
Metadata DB (Cassandra)
Object Storage (S3)
Kafka for analytics

Video Processing Pipeline:

  1. Upload β†’ S3
  2. Transcoding (FFmpeg)
    • Multiple resolutions (360p, 480p, 720p, 1080p, 4K)
    • Multiple formats (H.264, H.265, VP9)
    • Adaptive bitrate streaming (HLS, DASH)
  3. Thumbnail generation
  4. Content moderation (AI/ML)
  5. Store in distributed storage
  6. Update metadata DB
  7. Invalidate CDN cache

Streaming:

  • Adaptive Bitrate Streaming (ABR)
    • HLS (HTTP Live Streaming) - Apple
    • DASH (Dynamic Adaptive Streaming over HTTP)
  • Client adjusts quality based on bandwidth
  • Chunked delivery (2-10 second segments)

CDN Architecture:

  • Multi-tier CDN
  • Edge locations worldwide
  • Popular videos cached at edge
  • Long-tail videos served from origin

Recommendation System:

  • Collaborative filtering
  • Content-based filtering
  • Deep learning models
  • Real-time and batch processing

Scalability:

  • Video sharding by video_id
  • Geo-distributed CDNs
  • Multiple data centers
  • Read replicas for metadata

Interview Focus:

  • Transcoding pipeline optimization
  • Adaptive bitrate streaming
  • CDN strategy
  • Recommendation algorithm
  • Cost optimization (storage + bandwidth)

Common Follow-ups:

  • "How to handle live streaming?"
  • "Design the recommendation system"
  • "Handle copyright detection"
  • "Optimize for mobile bandwidth"

4. Design Facebook / Meta πŸ”₯πŸ”₯πŸ”₯​

Difficulty: Hard | Frequency: Very High

Functional Requirements:

  • News feed
  • Post (text, images, videos)
  • Like, comment, share
  • Friend requests
  • Notifications
  • Groups
  • Messenger integration

Non-Functional Requirements:

  • 3B+ users
  • High availability
  • Low latency (<200ms)
  • Strong consistency for friend relationships

Key Components:

User Service
Post Service
News Feed Service
Friend Service
Notification Service
Graph Database (TAO)
MySQL Shards
Memcached/Redis
CDN

News Feed Algorithm:

  • EdgeRank scoring:
    • Affinity Score (relationship strength)
    • Weight (content type)
    • Time Decay
  • ML-based ranking
  • Personalization

Scalability:

  • TAO (The Associations and Objects) - distributed graph
  • MySQL sharding by user_id
  • Feed caching in Memcached
  • Async processing with queues

Interview Focus:

  • Friend graph storage (TAO)
  • News feed generation at scale
  • Real-time notifications
  • Consistency in friend relationships

Category B: E-commerce & Marketplaces (Must-Do)​

5. Design Amazon / E-commerce Platform πŸ”₯πŸ”₯πŸ”₯​

Difficulty: Hard | Frequency: Very High

Functional Requirements:

  • Product catalog
  • Search and filter
  • Shopping cart
  • Order management
  • Payment processing
  • Inventory management
  • Recommendations
  • Reviews and ratings

Non-Functional Requirements:

  • 100M+ products
  • 50M DAU
  • High consistency for inventory
  • Low latency for search
  • 99.99% availability

Key Components:

Product Catalog Service
Search Service (Elasticsearch)
Cart Service
Order Service
Payment Service
Inventory Service
Recommendation Engine
Review Service
CDN for images

Database Design:

Products: product_id, name, description, price, category
Inventory: product_id, warehouse_id, quantity
Orders: order_id, user_id, status, total_amount
Order_Items: order_id, product_id, quantity, price
Users: user_id, name, email, addresses
Reviews: review_id, product_id, user_id, rating, comment

Search System:

  • Elasticsearch for full-text search
  • Filters (price, rating, brand)
  • Autocomplete
  • Typo tolerance
  • Ranking algorithm

Cart Management:

  • Store in Redis (session-based)
  • Persistent cart in DB
  • Cart expiration (30 days)

Inventory Management:

  • Real-time inventory updates
  • Reservation system during checkout
  • Distributed locks to prevent overselling
  • Eventual consistency for reads

Order Processing:

  1. Add to cart β†’ Reserve inventory
  2. Checkout β†’ Payment processing
  3. Payment success β†’ Create order
  4. Update inventory β†’ Send to warehouse
  5. Shipping β†’ Delivery

Payment Flow:

  • Payment gateway integration (Stripe, Razorpay)
  • Idempotency for duplicate requests
  • 3D Secure authentication
  • Fraud detection
  • Refund handling

Scalability:

  • Product catalog in NoSQL (Cassandra)
  • Shard by product_id or category
  • Cache popular products
  • Separate read/write databases
  • CDN for product images

Interview Focus:

  • Inventory consistency (prevent overselling)
  • Search optimization
  • Payment processing reliability
  • Flash sales handling
  • Recommendation algorithm

Common Follow-ups:

  • "How to handle flash sales (e.g., iPhone launch)?"
  • "Design the recommendation system"
  • "Handle concurrent checkouts for last item"
  • "Design fraud detection"

6. Design Uber / Ride-Sharing πŸ”₯πŸ”₯πŸ”₯​

Difficulty: Hard | Frequency: Very High

Functional Requirements:

  • Rider requests ride
  • Match with nearby driver
  • Real-time location tracking
  • ETA calculation
  • Fare calculation
  • Rating system
  • Payment

Non-Functional Requirements:

  • Millions of rides per day
  • Low latency for matching (<5 seconds)
  • High availability
  • Accurate location tracking

Key Components:

Rider Service
Driver Service
Matching Service
Location Service
Trip Service
Payment Service
Notification Service
QuadTree/Geohash for location
Kafka for real-time streams
Redis for caching
PostgreSQL/Cassandra

Location Services: Geospatial Indexing:

  • QuadTree
  • Geohash
  • S2 Geometry (Google)

Matching Algorithm:

  1. Rider requests ride
  2. Find nearby drivers (within 5km radius)
  3. Rank drivers by:
    • Distance
    • Driver rating
    • Acceptance rate
  4. Send request to top 3-5 drivers
  5. First to accept gets the ride

Real-time Tracking:

  • Drivers send location every 4-5 seconds
  • WebSocket connection
  • Update in Redis cache
  • Persist in Cassandra (time-series)

ETA Calculation:

  • Historical traffic data
  • Real-time traffic (Google Maps API)
  • Machine learning models
  • Update dynamically

Fare Calculation:

  • Base fare
  • Per km/mile charge
  • Per minute charge
  • Surge pricing (demand-based)
  • Tolls and taxes

Surge Pricing:

  • Calculate demand/supply ratio per area
  • Apply multiplier (1.2x, 1.5x, 2x)
  • Update every minute
  • Notify riders

Database Design:

Riders: rider_id, name, phone, rating
Drivers: driver_id, name, phone, vehicle, rating, location
Trips: trip_id, rider_id, driver_id, start_location, end_location, fare, status
Locations: driver_id, lat, long, timestamp (time-series)

Scalability:

  • Shard by city/region (geosharding)
  • QuadTree for each region
  • Separate services per city
  • Real-time location in Redis
  • Historical data in Cassandra

Interview Focus:

  • Geospatial indexing (QuadTree vs Geohash)
  • Matching algorithm efficiency
  • Real-time location tracking
  • Surge pricing calculation
  • ETA accuracy

Common Follow-ups:

  • "How to handle peak hours?"
  • "Design Uber Pool (ride sharing)"
  • "Optimize for driver earnings"
  • "Handle driver going offline during trip"

7. Design Food Delivery (Uber Eats, DoorDash) πŸ”₯πŸ”₯​

Difficulty: Hard | Frequency: High

Functional Requirements:

  • Browse restaurants
  • Place order
  • Real-time order tracking
  • Delivery person assignment
  • Ratings and reviews

Non-Functional Requirements:

  • Low latency
  • High availability
  • Accurate ETA
  • Optimize delivery routes

Key Components:

  • Restaurant Service
  • Order Service
  • Delivery Service (matching algorithm)
  • Location Tracking Service
  • Notification Service

Challenges:

  • Three-way matching (customer, restaurant, delivery person)
  • Multiple pickup and delivery optimization
  • Keep food hot/fresh (time constraints)

Interview Focus:

  • Three-way logistics optimization
  • Route optimization for multiple orders
  • Real-time tracking

Category C: Communication & Collaboration (Important)​

8. Design WhatsApp / Chat Messenger πŸ”₯πŸ”₯πŸ”₯​

Difficulty: Hard | Frequency: Very High

Functional Requirements:

  • One-on-one messaging
  • Group chat
  • Message delivery (sent, delivered, read)
  • Online/offline status
  • Media sharing
  • End-to-end encryption

Non-Functional Requirements:

  • 2B+ users
  • Real-time delivery (<1 second)
  • High availability
  • Message persistence

Key Components:

WebSocket Server (for real-time)
Message Service
User Service
Group Service
Media Service
Notification Service
Cassandra (messages)
Redis (online status)
S3 (media storage)

Real-time Communication:

  • WebSocket for bidirectional communication
  • Long polling (fallback)
  • XMPP protocol (extensible)

Message Flow:

  1. Sender β†’ WebSocket Server
  2. Server checks receiver online status
  3. If online: Push via WebSocket
  4. If offline: Store in queue, send push notification
  5. Store message in DB (Cassandra)
  6. Acknowledge to sender

Message Storage:

Messages: message_id, sender_id, receiver_id, content, timestamp, status
Groups: group_id, name, members, created_by
Group_Messages: message_id, group_id, sender_id, content, timestamp

Read Receipts:

  • Double tick (delivered)
  • Blue tick (read)
  • Send acknowledgments back to sender

Group Chat:

  • Max 256 members (WhatsApp limit)
  • Fan-out to all members
  • Message ordering challenges
  • Admin privileges

Media Sharing:

  • Upload to S3
  • Generate thumbnail
  • Share URL in message
  • Progressive download

End-to-End Encryption:

  • Signal Protocol
  • Public/private key exchange
  • Server cannot read messages

Scalability:

  • Shard by user_id
  • Connection servers by region
  • Separate servers for media
  • Message queue for offline delivery

Interview Focus:

  • Real-time message delivery
  • Message ordering in groups
  • Last seen and online status
  • Encryption implementation
  • Scale to billions of messages

Common Follow-ups:

  • "How to implement message sync across devices?"
  • "Design group admin features"
  • "Handle user blocking"
  • "Implement disappearing messages"

9. Design Slack / Microsoft Teams πŸ”₯πŸ”₯​

Difficulty: Hard | Frequency: High

Functional Requirements:

  • Workspaces and channels
  • Direct messages
  • File sharing
  • Search messages
  • Threads
  • Reactions
  • Integrations (bots, webhooks)

Non-Functional Requirements:

  • Real-time messaging
  • Message history
  • High availability
  • Low latency

Key Components:

WebSocket Gateway
Channel Service
Message Service
Search Service (Elasticsearch)
File Service
Notification Service
PostgreSQL + Cassandra
Redis Cache

Differences from WhatsApp:

  • Workspace/channel hierarchy
  • Thread replies
  • Rich formatting
  • Integrations and bots
  • Search is critical

Channel Design:

  • Public vs private channels
  • Member management
  • Channel history
  • Unread counts

Search:

  • Full-text search (Elasticsearch)
  • Search within channels
  • Filter by date, person, file type
  • Message ranking

Scalability:

  • Shard by workspace_id
  • Separate WebSocket connections per workspace
  • Cache channel metadata

Interview Focus:

  • Workspace isolation
  • Real-time typing indicators
  • Thread implementation
  • Search at scale

10. Design Zoom / Video Conferencing πŸ”₯πŸ”₯​

Difficulty: Hard | Frequency: High

Functional Requirements:

  • Video/audio streaming
  • Screen sharing
  • Chat
  • Recording
  • Virtual backgrounds
  • Breakout rooms

Non-Functional Requirements:

  • Low latency (<300ms)
  • High quality video
  • Support 100+ participants
  • Reliable connectivity

Key Components:

Signaling Server (WebRTC)
Media Server (SFU - Selective Forwarding Unit)
TURN/STUN servers
Recording Service
Chat Service

Video Streaming:

  • WebRTC for peer-to-peer
  • SFU (Selective Forwarding Unit) for multi-party
    • Participants send once to SFU
    • SFU forwards to all participants
    • Reduces bandwidth
  • MCU (Multipoint Control Unit) - alternative
    • Mixes all streams
    • Higher server load

Architecture:

Client A ──┐
β”œβ”€β”€β†’ SFU Server ──→ Client C
Client B β”€β”€β”˜ Client D

Bandwidth Optimization:

  • Adaptive bitrate
  • Simulcast (multiple qualities)
  • Active speaker detection
  • Gallery view vs speaker view

Scalability:

  • Multiple SFU servers
  • Route by geography
  • Scale based on concurrent meetings

Interview Focus:

  • WebRTC vs traditional streaming
  • SFU vs MCU tradeoff
  • Latency optimization
  • Handle poor network conditions

Category D: Search & Discovery (Important)​

11. Design Google Search πŸ”₯πŸ”₯πŸ”₯​

Difficulty: Very Hard | Frequency: High

Functional Requirements:

  • Web crawling
  • Indexing
  • Search query processing
  • Ranking results
  • Autocomplete
  • Spell correction

Non-Functional Requirements:

  • Billions of web pages
  • Sub-second query response
  • High availability
  • Fresh results

Key Components:

Web Crawler (distributed)
Indexer (MapReduce)
Index Storage (inverted index)
Query Processor
Ranking Service (PageRank)
Cache Layer

Web Crawling:

  • Distributed crawlers
  • URL frontier (queue)
  • Politeness policy (robots.txt)
  • Priority queue for recrawling
  • Duplicate detection (URL fingerprinting)

Indexing:

  • Inverted index: term β†’ list of documents
  • Forward index: document β†’ list of terms
  • MapReduce for distributed indexing
Example Inverted Index:
"apple" β†’ [doc1, doc5, doc23, ...]
"orange" β†’ [doc2, doc5, doc18, ...]

Ranking:

  • PageRank algorithm
  • TF-IDF (Term Frequency-Inverse Document Frequency)
  • Click-through rate
  • Dwell time
  • Freshness
  • Authority
  • 200+ ranking signals

Query Processing:

  1. Spell correction
  2. Query expansion (synonyms)
  3. Lookup inverted index
  4. Rank results
  5. Apply personalization
  6. Return top K results

Autocomplete:

  • Trie data structure
  • Precompute popular queries
  • Personalization based on history
  • Update based on trending searches

Scalability:

  • Shard index by term
  • Replicate for availability
  • Cache popular queries
  • Geo-distributed data centers

Interview Focus:

  • Crawling strategy
  • Inverted index design
  • PageRank algorithm
  • Query optimization
  • Freshness vs relevance tradeoff

12. Design Typeahead / Autocomplete πŸ”₯πŸ”₯​

Difficulty: Medium | Frequency: High

Functional Requirements:

  • Suggest queries as user types
  • Top K suggestions
  • Real-time updates
  • Personalization

Non-Functional Requirements:

  • Low latency (<100ms)
  • High availability
  • Handle typos
  • Scale to millions of queries

Key Components:

Trie data structure
Cache (Redis)
Analytics service (Kafka + Spark)
Database (Cassandra)
CDN

Data Structure:

  • Trie with frequency counts
  • Each node stores top K children

Suggestion Generation:

  1. User types "fac"
  2. Traverse Trie to node "fac"
  3. Return precomputed top K suggestions
    • "facebook"
    • "facebook login"
    • "factory"

Ranking:

  • Query frequency
  • Recency
  • User personalization
  • Geographic relevance

Updates:

  • Batch processing (hourly/daily)
  • Incremental updates
  • A/B testing new suggestions

Scalability:

  • Shard Trie by prefix
  • Cache hot prefixes
  • Separate Tries for different languages

Interview Focus:

  • Trie optimization
  • Real-time vs batch updates
  • Personalization strategy
  • Typo handling

Category E: Content & Media (Important)​

13. Design TikTok / Short Video Platform πŸ”₯πŸ”₯​

Difficulty: Hard | Frequency: High

Functional Requirements:

  • Upload short videos (15-60 seconds)
  • Personalized feed (For You page)
  • Like, comment, share
  • Follow users
  • Trending content

Non-Functional Requirements:

  • Billions of videos
  • Highly engaging feed
  • Low latency for video loading
  • Recommendation accuracy

Key Components:

Video Upload Service
Transcoding Pipeline
Recommendation Engine (ML)
Feed Service
CDN
S3/Blob Storage
Redis Cache

For You Page (FYP) Algorithm:

  • Collaborative filtering
  • Content-based filtering
  • User behavior signals:
    • Watch time
    • Completion rate
    • Likes, shares, comments
    • Replays
  • Cold start problem (new users)
  • Diversity injection (avoid echo chamber)

Video Pipeline:

  1. Upload β†’ S3
  2. Transcode (multiple qualities)
  3. Extract features (AI/ML)
    • Objects, faces, text
    • Audio analysis
  4. Generate thumbnails
  5. Store metadata
  6. Push to CDN

Recommendation System:

  • Real-time feature extraction
  • Batch model training
  • Online serving with low latency
  • A/B testing new models

Scalability:

  • Geo-distributed CDNs
  • Separate hot/cold storage
  • Pre-fetch next videos in feed

Interview Focus:

  • Recommendation algorithm
  • Video processing pipeline
  • Infinite scroll implementation
  • Content moderation at scale

14. Design Spotify / Music Streaming πŸ”₯πŸ”₯​

Difficulty: Hard | Frequency: High

Functional Requirements:

  • Stream music
  • Search songs, artists, albums
  • Playlists
  • Recommendations
  • Offline download
  • Social features (share, follow)

Non-Functional Requirements:

  • Millions of songs
  • Low latency streaming
  • High availability
  • Personalization

Key Components:

Music Metadata Service
Streaming Service
Recommendation Engine
Playlist Service
CDN
Storage (S3)

Music Streaming:

  • Audio formats (MP3, AAC, Ogg Vorbis)
  • Multiple bitrates (96, 128, 320 kbps)
  • Chunked streaming (similar to HLS)
  • Pre-fetching next songs
  • Offline caching

Recommendation:

  • Collaborative filtering
  • Audio feature analysis
  • User listening history
  • Playlist similarity
  • Context-aware (time, mood, activity)

Playlist Management:

  • User-created playlists
  • Algorithm-generated playlists
    • Discover Weekly
    • Release Radar
    • Daily Mix

Scalability:

  • CDN for music files
  • Cache popular songs at edge
  • Separate recommendation service

Interview Focus:

  • Streaming optimization
  • Recommendation algorithm
  • Offline mode implementation
  • Social features integration

Category F: Booking & Reservation (Important)​

15. Design Airbnb / Hotel Booking πŸ”₯πŸ”₯​

Difficulty: Hard | Frequency: High

Functional Requirements:

  • Search properties (location, dates, guests)
  • View property details
  • Booking and payment
  • Reviews and ratings
  • Host management
  • Calendar management

Non-Functional Requirements:

  • Global scale
  • Accurate availability
  • Prevent double booking
  • Search performance

Key Components:

Search Service (Elasticsearch)
Booking Service
Payment Service
Calendar Service
Review Service
Recommendation Engine

Search:

  • Geospatial search (lat, long, radius)
  • Filters (price, amenities, property type)
  • Ranking algorithm:
    • Price
    • Reviews
    • Availability
    • Host responsiveness
    • Cancellation policy

Booking Flow:

  1. User selects dates
  2. Check availability (distributed lock)
  3. Reserve for 15 minutes
  4. Payment processing
  5. Confirm booking
  6. Update calendar
  7. Send confirmation

Calendar Management:

  • Availability calendar per property
  • Block dates for bookings
  • Handle cancellations
  • Sync with external calendars (iCal)

Prevent Double Booking:

  • Distributed locks (Redis)
  • Database transactions
  • Optimistic locking
  • Reservation expiry

Database Design:

Properties: property_id, host_id, location, price, amenities
Bookings: booking_id, property_id, user_id, check_in, check_out, status
Calendar: property_id, date, available
Reviews: review_id, property_id, user_id, rating, comment

Scalability:

  • Shard by geography
  • Cache search results
  • Separate booking and search services
  • Async processing for reviews

Interview Focus:

  • Double booking prevention
  • Geospatial search
  • Calendar synchronization
  • Dynamic pricing

16. Design Ticket Master / Event Booking πŸ”₯πŸ”₯​

Difficulty: Hard | Frequency: Medium

Functional Requirements:

  • List events
  • Seat selection
  • Ticket booking
  • Payment processing
  • Ticket transfer

Non-Functional Requirements:

  • Handle flash crowds (Taylor Swift effect)
  • Prevent scalping (bots)
  • Fair ticket distribution

Key Components:

Event Service
Seat Selection Service
Queue Service (virtual waiting room)
Payment Service
Anti-bot Service

Flash Sale Handling:

  • Virtual waiting room (queue)
  • Rate limiting per user
  • CAPTCHA
  • Token bucket algorithm
  • Lottery system for high-demand

Seat Locking:

  • Lock seat for 10 minutes during checkout
  • Release if payment fails
  • Distributed lock (Redis)

Anti-bot Measures:

  • CAPTCHA
  • Device fingerprinting
  • Rate limiting
  • Behavioral analysis

Interview Focus:

  • Handle millions of concurrent users
  • Fair ticket distribution
  • Prevent bots and scalpers
  • Seat locking mechanism

Category G: Collaborative & Productivity (Nice to Have)​

17. Design Google Docs / Collaborative Editor πŸ”₯πŸ”₯​

Difficulty: Very Hard | Frequency: Medium

Functional Requirements:

  • Real-time collaborative editing
  • Conflict resolution
  • Version history
  • Comments and suggestions
  • Offline mode

Non-Functional Requirements:

  • Multiple users editing simultaneously
  • Eventual consistency
  • Low latency (<100ms)
  • Data persistence

Key Components:

WebSocket Server
Operational Transformation (OT) Engine
Conflict Resolution Service
Version Control Service
Storage Service

Operational Transformation (OT):

  • Transform operations to handle conflicts
  • Example:
    • User A inserts "X" at position 5
    • User B deletes character at position 3
    • Transform B's operation considering A's insert

Alternative: CRDT (Conflict-free Replicated Data Types)

  • Mathematical approach to merge conflicts
  • Used by modern systems
  • Examples: Yjs, Automerge

Real-time Sync:

  1. User types β†’ Send operation to server
  2. Server broadcasts to all connected users
  3. Apply OT/CRDT to resolve conflicts
  4. Update document
  5. Acknowledge to all users

Version History:

  • Snapshot every N operations
  • Store diffs between versions
  • Restore to any previous version

Scalability:

  • One WebSocket server per document region
  • Shard documents by doc_id
  • Eventual consistency model

Interview Focus:

  • Operational Transformation vs CRDT
  • Conflict resolution algorithm
  • Real-time sync architecture
  • Version control strategy

18. Design Dropbox / Google Drive πŸ”₯πŸ”₯​

Difficulty: Hard | Frequency: High

Functional Requirements:

  • Upload/download files
  • Sync across devices
  • File sharing
  • Version history
  • Offline access

Non-Functional Requirements:

  • Reliable file sync
  • Efficient bandwidth usage
  • Storage optimization
  • High availability

Key Components:

Sync Service
Metadata Service
Block Storage (S3)
Notification Service
Client Application

File Synchronization:

  • Chunking (4MB blocks)
  • Delta sync (only changed blocks)
  • Deduplication (same file hash)
  • Compression

Sync Algorithm:

  1. Client hashes local files
  2. Send hashes to server
  3. Server compares with stored hashes
  4. Only upload changed blocks
  5. Server reconstructs file
  6. Notify other devices

Metadata vs Data:

  • Metadata: filename, path, size, modified date (SQL)
  • Data: actual file content (Object storage)

Conflict Resolution:

  • Last write wins (with timestamp)
  • Create conflict copy (Filename_conflict_copy)
  • User resolves manually

Scalability:

  • Deduplicate at block level
  • Compress files
  • CDN for downloads
  • Separate metadata and file storage

Interview Focus:

  • Block-level deduplication
  • Delta sync algorithm
  • Conflict resolution
  • Offline mode implementation

Category H: Payment & Financial (Nice to Have)​

19. Design Paytm / Payment Wallet πŸ”₯πŸ”₯​

Difficulty: Hard | Frequency: Medium

Functional Requirements:

  • Add money to wallet
  • Send money to users
  • Pay merchants
  • Transaction history
  • Offers and cashback

Non-Functional Requirements:

  • Strong consistency (money)
  • ACID transactions
  • High availability
  • Audit trail

Key Components:

Wallet Service
Transaction Service
Payment Gateway
Ledger Service (double-entry bookkeeping)
Notification Service

Transaction Flow:

  1. User initiates payment
  2. Validate balance
  3. Debit sender account (BEGIN TRANSACTION)
  4. Credit receiver account
  5. Record in ledger
  6. COMMIT or ROLLBACK
  7. Send notifications

Double-Entry Bookkeeping:

Transaction: A sends β‚Ή100 to B
Debit: A's account -β‚Ή100
Credit: B's account +β‚Ή100
Must balance: -β‚Ή100 + β‚Ή100 = 0

Idempotency:

  • Same request twice shouldn't charge twice
  • Use unique transaction ID
  • Check for duplicate before processing

Database Design:

Wallets: wallet_id, user_id, balance
Transactions: txn_id, from_wallet, to_wallet, amount, status, timestamp
Ledger: entry_id, txn_id, wallet_id, debit/credit, amount

Scalability:

  • Shard by user_id
  • Read replicas for transaction history
  • Strong consistency for wallet balance (master DB)
  • Event sourcing for audit trail

Interview Focus:

  • ACID transaction guarantees
  • Idempotency handling
  • Double-entry bookkeeping
  • Reconciliation system

20. Design Stock Exchange / Trading Platform πŸ”₯​

Difficulty: Very Hard | Frequency: Low

Functional Requirements:

  • Place orders (market, limit)
  • Match orders
  • Real-time price updates
  • Order book
  • Portfolio management

Non-Functional Requirements:

  • Ultra-low latency (<1ms)
  • High throughput (millions of orders/sec)
  • Strong consistency
  • Fair order matching

Key Components:

Order Matching Engine
Order Book
Market Data Feed
Risk Management
Clearing and Settlement

Order Matching:

  • Price-Time Priority
  • Order book (binary heap or order queue)
  • FIFO for same price

Order Types:

  • Market order (execute immediately at best price)
  • Limit order (execute at specified price or better)
  • Stop order
  • Good-till-cancelled (GTC)

Scalability:

  • In-memory matching engine (C++)
  • Low-latency network (kernel bypass, RDMA)
  • Separate matching engine per symbol
  • Hot/cold data separation

Interview Focus:

  • Order matching algorithm
  • Latency optimization techniques
  • Fair order execution
  • Risk management

6️⃣ System Components Deep Dive πŸŸ‘β€‹

1. Content Delivery Network (CDN) πŸ”₯πŸ”₯​

Purpose:

  • Serve static content closer to users
  • Reduce latency
  • Reduce origin server load
  • DDoS protection

How it Works:

  1. User requests image from CDN
  2. CDN checks if cached at edge
  3. If yes, serve from edge (cache hit)
  4. If no, fetch from origin, cache, and serve (cache miss)

Popular CDNs:

  • Cloudflare
  • Akamai
  • Amazon CloudFront
  • Fastly

Use Cases:

  • Images, videos
  • JavaScript, CSS files
  • Downloadable content

2. Reverse Proxy πŸ”₯​

Purpose:

  • Load balancing
  • SSL termination
  • Caching
  • Security (hide backend)

Examples: NGINX, HAProxy


3. API Gateway πŸ”₯πŸ”₯​

Purpose:

  • Single entry point for all clients
  • Authentication and authorization
  • Rate limiting
  • Request routing
  • Response aggregation
  • API versioning

Examples:

  • Kong
  • AWS API Gateway
  • Apigee

4. Service Mesh πŸ”₯​

Purpose:

  • Microservice communication management
  • Service discovery
  • Load balancing
  • Observability
  • Security (mTLS)

Examples:

  • Istio
  • Linkerd
  • Consul

5. Distributed Locking πŸ”₯πŸ”₯​

Purpose:

  • Coordinate access to shared resources
  • Prevent race conditions

Implementations:

  • Redis (RedLock)
  • ZooKeeper
  • etcd
  • Database-based locks

Use Cases:

  • Preventing double booking
  • Leader election
  • Distributed cron jobs

6. Rate Limiting πŸ”₯πŸ”₯​

Algorithms:

  1. Token Bucket - Smooth rate limiting
  2. Leaky Bucket - Constant outflow
  3. Fixed Window - Simple but has burst issue
  4. Sliding Window - More accurate

Implementation:

  • Redis counters
  • In-memory (local rate limiting)
  • Distributed (global rate limiting)

Use Cases:

  • API rate limiting (1000 requests/hour)
  • Login attempts (5 attempts/15 minutes)
  • Payment processing

7. Distributed Tracing πŸ”₯​

Purpose:

  • Track requests across microservices
  • Performance monitoring
  • Debugging

Tools:

  • Jaeger
  • Zipkin
  • AWS X-Ray

Concepts:

  • Trace ID (spans entire request)
  • Span ID (individual service call)

8. Circuit Breaker πŸ”₯​

Purpose:

  • Prevent cascading failures
  • Fail fast when service is down
  • Give service time to recover

States:

  1. Closed (normal operation)
  2. Open (service failing, reject requests)
  3. Half-Open (test if service recovered)

Tools:

  • Hystrix (deprecated but concept important)
  • Resilience4j

9. Service Discovery πŸ”₯​

Purpose:

  • Find service instances dynamically
  • Handle dynamic scaling
  • Health checks

Types:

  • Client-side discovery (Netflix Eureka)
  • Server-side discovery (Consul, etcd)

Examples:

  • Consul
  • Eureka
  • ZooKeeper
  • etcd

10. Time-Series Database πŸ”₯​

Purpose:

  • Store metrics and logs
  • Time-based queries
  • Aggregations

Examples:

  • InfluxDB
  • TimescaleDB
  • Prometheus

Use Cases:

  • Application metrics
  • Server monitoring
  • IoT sensor data

11. Full-Text Search Engine πŸ”₯πŸ”₯​

Elasticsearch Deep Dive:

Key Concepts:

  • Documents (JSON objects)
  • Index (collection of documents)
  • Shards (horizontal partitioning)
  • Replicas (copies for availability)

Inverted Index:

"quick brown fox" β†’ tokenize β†’ [quick, brown, fox]
Index:
quick β†’ [doc1, doc5]
brown β†’ [doc1, doc3]
fox β†’ [doc1, doc2, doc5]

Query Types:

  • Match query (full-text search)
  • Term query (exact match)
  • Bool query (combine multiple queries)
  • Range query (dates, numbers)

Scoring:

  • TF-IDF (Term Frequency-Inverse Document Frequency)
  • BM25 (improved relevance)

Use Cases:

  • Product search
  • Log aggregation (ELK stack)
  • Application search

12. Object Storage πŸ”₯πŸ”₯​

S3 Deep Dive:

Features:

  • Store any type of file
  • Unlimited storage
  • 99.999999999% (11 9's) durability
  • Bucket and object model

Storage Classes:

  • S3 Standard (frequent access)
  • S3 Infrequent Access (IA)
  • S3 Glacier (archival)

Use Cases:

  • Media files (images, videos)
  • Backups
  • Data lakes
  • Static website hosting

Best Practices:

  • Use CloudFront CDN
  • Enable versioning
  • Lifecycle policies
  • Pre-signed URLs for secure access

13. Graph Databases πŸ”₯​

Purpose:

  • Store relationships efficiently
  • Graph traversal queries

Examples:

  • Neo4j
  • Amazon Neptune
  • ArangoDB

Use Cases:

  • Social networks (friend relationships)
  • Recommendation engines
  • Fraud detection
  • Knowledge graphs

When to Use:

  • Many-to-many relationships
  • Complex join queries in SQL
  • Path finding problems

14. Vector Databases πŸ”₯ (New in 2024-25)​

Purpose:

  • Store embeddings (vectors)
  • Semantic search
  • Similarity search

Examples:

  • Pinecone
  • Weaviate
  • Milvus
  • Qdrant

Use Cases:

  • AI/ML applications
  • Recommendation systems
  • Image similarity
  • Semantic search
  • RAG (Retrieval Augmented Generation) for LLMs

Why Important:

  • Rise of LLMs and AI applications
  • Vector embeddings for semantic meaning

15. Streaming Platforms πŸ”₯πŸ”₯​

Apache Kafka Deep Dive:

Key Concepts:

  • Topics (channels)
  • Partitions (parallel processing)
  • Producers (write)
  • Consumers (read)
  • Consumer Groups (load balancing)

Use Cases:

  • Real-time analytics
  • Log aggregation
  • Event sourcing
  • CDC (Change Data Capture)

Kafka vs Message Queue:

  • Kafka: High throughput, persistent, replay
  • MQ: Lower latency, transient, no replay

Other Options:

  • Apache Pulsar
  • Amazon Kinesis
  • Google Pub/Sub

7️⃣ Database Scaling Patterns πŸ”΄β€‹

1. Replication πŸ”₯πŸ”₯πŸ”₯​

Master-Slave (Primary-Replica):

  • All writes go to master
  • Reads from replicas
  • Asynchronous replication
  • Replication lag possible

Use Cases:

  • Read-heavy applications
  • Analytics on replicas
  • Geographic distribution

Master-Master:

  • Both can accept writes
  • Conflict resolution needed
  • More complex

2. Sharding (Horizontal Partitioning) πŸ”₯πŸ”₯πŸ”₯​

Sharding Strategies:

1. Range-Based Sharding:

  • Users A-M β†’ Shard 1
  • Users N-Z β†’ Shard 2
  • Pros: Simple, range queries easy
  • Cons: Uneven distribution (hotspots)

2. Hash-Based Sharding:

  • Hash(user_id) % num_shards
  • Pros: Even distribution
  • Cons: Range queries difficult, resharding hard

3. Consistent Hashing:

  • Virtual nodes on hash ring
  • Pros: Minimal data movement when scaling
  • Cons: More complex

4. Directory-Based:

  • Lookup table maps keys to shards
  • Pros: Flexible
  • Cons: Single point of failure (directory service)

Challenges:

  • Cross-shard queries
  • Distributed transactions
  • Resharding (when adding shards)
  • Hotspot handling

3. Partitioning (Vertical) πŸ”₯​

Split tables by columns:

  • User basic info β†’ Shard 1
  • User extended profile β†’ Shard 2

Benefits:

  • Reduce I/O
  • Different storage types for different data

4. Denormalization πŸ”₯πŸ”₯​

Purpose:

  • Optimize read performance
  • Reduce joins

Trade-off:

  • Faster reads
  • Slower writes
  • Data duplication
  • Consistency challenges

Example:

Normalized:
Users: user_id, name
Posts: post_id, user_id, content

Denormalized:
Posts: post_id, user_id, user_name, content
(user_name duplicated)

5. CQRS (Command Query Responsibility Segregation) πŸ”₯​

Concept:

  • Separate read and write models
  • Optimize each independently

Architecture:

Write Model (Commands) β†’ PostgreSQL (normalized)
↓ (sync via events)
Read Model (Queries) β†’ Elasticsearch (denormalized)

Use Cases:

  • Complex domain logic
  • Read-heavy with complex queries
  • Event sourcing

1. Serverless Architecture πŸ”₯​

AWS Lambda, Google Cloud Functions:

  • No server management
  • Auto-scaling
  • Pay per invocation

Use Cases:

  • Event-driven tasks
  • Scheduled jobs
  • API backends (with API Gateway)

Limitations:

  • Cold start latency
  • Execution time limits (15 min AWS Lambda)
  • Vendor lock-in

2. Edge Computing πŸ”₯​

Concept:

  • Process data closer to users
  • Reduce latency
  • Cloudflare Workers, AWS Lambda@Edge

Use Cases:

  • A/B testing at edge
  • Personalization
  • Bot detection
  • Image optimization

3. Event-Driven Architecture πŸ”₯πŸ”₯​

Components:

  • Event producers
  • Event bus (Kafka, SNS, EventBridge)
  • Event consumers

Benefits:

  • Loose coupling
  • Scalability
  • Async processing

Patterns:

  • Event Notification
  • Event-Carried State Transfer
  • Event Sourcing
  • CQRS

4. Data Lakes & Warehouses πŸ”₯​

Data Lake:

  • Store raw data (all formats)
  • S3, Azure Data Lake
  • Schema-on-read

Data Warehouse:

  • Structured data
  • Optimized for analytics
  • Redshift, Snowflake, BigQuery
  • Schema-on-write

Modern: Data Lakehouse:

  • Combines benefits of both
  • Delta Lake, Apache Iceberg

5. Real-Time Analytics πŸ”₯​

Stream Processing:

  • Apache Flink
  • Apache Spark Streaming
  • Kafka Streams

Use Cases:

  • Real-time dashboards
  • Fraud detection
  • Anomaly detection
  • Real-time recommendations

6. Multi-Tenancy πŸ”₯​

Approaches:

1. Separate Database per Tenant:

  • Pros: Isolation, easy backup
  • Cons: Expensive, harder to scale

2. Shared Database, Separate Schema:

  • Pros: Medium isolation
  • Cons: Schema management

3. Shared Database, Shared Schema:

  • Pros: Cost-effective, easy to scale
  • Cons: Less isolation, tenant_id in every table

Considerations:

  • Data isolation
  • Performance isolation
  • Compliance requirements

7. Feature Flags / Toggles πŸ”₯​

Purpose:

  • Deploy features disabled
  • Enable for specific users
  • A/B testing
  • Gradual rollout
  • Kill switch

Tools:

  • LaunchDarkly
  • Split.io
  • Unleash
  • Custom (Redis-based)

8. Chaos Engineering πŸ”₯​

Concept:

  • Intentionally inject failures
  • Test system resilience
  • Identify weaknesses

Tools:

  • Chaos Monkey (Netflix)
  • Gremlin
  • Chaos Mesh

Practices:

  • Random instance termination
  • Network latency injection
  • Disk failure simulation

9. Observability (O11y) πŸ”₯πŸ”₯​

Three Pillars:

1. Metrics:

  • Numerical measurements
  • Prometheus, Grafana
  • Examples: CPU, memory, request count

2. Logs:

  • Discrete events
  • ELK Stack (Elasticsearch, Logstash, Kibana)
  • Splunk, Datadog

3. Traces:

  • Request flow across services
  • Jaeger, Zipkin

Modern: OpenTelemetry:

  • Unified standard for metrics, logs, traces

10. AI/ML Integration in System Design πŸ”₯πŸ”₯ (2025 Trend)​

Common ML Components:

1. Recommendation Systems:

  • Collaborative filtering
  • Content-based filtering
  • Hybrid approaches
  • Real-time vs batch predictions

2. Search Ranking:

  • Learning to Rank (LTR)
  • Feature engineering
  • Model serving

3. Content Moderation:

  • Image/text classification
  • ML models for harmful content

4. Personalization:

  • User embeddings
  • Context-aware models

ML Serving Architecture:

Client β†’ API Gateway β†’ Model Server (TensorFlow Serving, TorchServe)
↓
Feature Store (Redis, Feast)
↓
Model Registry (MLflow)

Challenges:

  • Model versioning
  • A/B testing models
  • Feature drift
  • Real-time inference latency
  • Model monitoring

9️⃣ Interview Strategy & Framework πŸŽ―β€‹

The RESHADED Framework (45-60 min interview)​

Timeline:

1. Requirements (5-7 minutes) πŸ”₯πŸ”₯πŸ”₯

  • Clarify functional requirements
  • Clarify non-functional requirements
  • Ask about scale
  • Identify constraints

Example Questions to Ask:

  • "How many users are we expecting?"
  • "What's the read/write ratio?"
  • "Do we need strong consistency or eventual consistency?"
  • "What's the expected latency?"
  • "Do we need to support offline mode?"
  • "What are the most critical features?"

2. Estimations (5 minutes) πŸ”₯πŸ”₯

Back-of-envelope Calculations:

Example: Design Instagram

DAU (Daily Active Users): 500M
Assumptions:
- Each user posts 1 photo/day
- Each photo is 2MB
- Each user views 50 photos/day

Storage:
- Daily: 500M * 1 * 2MB = 1,000 TB/day = 1 PB/day
- Yearly: 1 PB * 365 = 365 PB/year

Bandwidth:
Read:
- 500M * 50 * 2MB / 86400 seconds = ~580 GB/s

Write:
- 500M * 1 * 2MB / 86400 seconds = ~11.6 GB/s

QPS:
Read: 500M * 50 / 86400 = ~289K QPS
Write: 500M * 1 / 86400 = ~5.8K QPS

Memory Estimates (80-20 Rule):

  • Cache 20% of daily traffic
  • 80% of requests hit cache

Useful Numbers:

1 Million requests/day = ~12 requests/second
1 Billion requests/day = ~12K requests/second
1 Petabyte = 1,000 Terabytes = 1,000,000 Gigabytes
1 Day = 86,400 seconds

3. System Interface / API Design (5 minutes) πŸ”₯πŸ”₯

Define APIs:

Example: Twitter

POST /api/v1/tweets
Body: { user_id, content, media_urls }
Response: { tweet_id, created_at }

GET /api/v1/timeline/{user_id}
Params: page, limit
Response: { tweets: [...], next_page_token }

POST /api/v1/follow
Body: { follower_id, followee_id }
Response: { success: true }

GET /api/v1/search
Params: query, page, limit
Response: { tweets: [...], users: [...] }

Important:

  • Define request/response structure
  • Mention authentication (JWT, OAuth)
  • Versioning (/api/v1/)
  • Rate limiting

4. High-Level Design (10-15 minutes) πŸ”₯πŸ”₯πŸ”₯

Draw Architecture Diagram:

Components to Include:

  1. Client (Web/Mobile)
  2. Load Balancer
  3. API Gateway
  4. Application Servers
  5. Caches (Redis)
  6. Databases (SQL/NoSQL)
  7. Object Storage (S3)
  8. CDN
  9. Message Queue (Kafka)
  10. Search Service (Elasticsearch)

Example Flow:

Mobile App β†’ Load Balancer β†’ API Gateway
↓
App Servers β†’ Redis Cache
↓ ↓
Database ← (cache miss)
↓
Kafka β†’ Workers
↓
S3 (media files)

Key Points:

  • Explain each component's purpose
  • Show data flow with arrows
  • Mention protocols (HTTP, WebSocket, gRPC)
  • Talk about data storage choices

5. Detailed Design (15-20 minutes) πŸ”₯πŸ”₯πŸ”₯

Deep Dive into 2-3 Core Components:

Interviewer will ask:

  • "How would you implement the feed generation?"
  • "Design the database schema"
  • "How would you handle real-time updates?"

Choose components to detail:

  • Most critical features
  • Challenging technical problems
  • Areas you're strong in

Example: Twitter Timeline Generation

Approach 1: Fan-out on Write (Push)

User tweets β†’ Write to all followers' timelines
Pros: Fast reads
Cons: Slow writes for celebrities, wasted space

When to use: Users with < 10K followers

Approach 2: Fan-out on Read (Pull)

User requests timeline β†’ Fetch tweets from followed users
Pros: Fast writes, no wasted space
Cons: Slow reads

When to use: Celebrities with millions of followers

Approach 3: Hybrid

Normal users: Fan-out on write
Celebrities: Fan-out on read
Best of both worlds

6. Database Design (5-7 minutes) πŸ”₯πŸ”₯

Schema Design:

Example: E-commerce

Users:
- user_id (PK)
- email
- name
- created_at

Products:
- product_id (PK)
- name
- description
- price
- category_id
- stock_quantity

Orders:
- order_id (PK)
- user_id (FK)
- total_amount
- status (pending, paid, shipped, delivered)
- created_at

Order_Items:
- id (PK)
- order_id (FK)
- product_id (FK)
- quantity
- price_at_purchase

Cart:
- user_id (PK)
- product_id (PK)
- quantity
- added_at

Decisions:

  • SQL vs NoSQL (explain why)
  • Normalization vs denormalization
  • Indexing strategy
  • Sharding key

7. Scalability & Bottlenecks (5-7 minutes) πŸ”₯πŸ”₯πŸ”₯

Identify Bottlenecks:

  • Database (single point)
  • Application servers
  • Network bandwidth
  • Cache invalidation

Solutions:

Database Bottleneck:

  • Read replicas
  • Sharding
  • Caching

Application Server Bottleneck:

  • Horizontal scaling
  • Load balancing
  • Stateless services

Network Bottleneck:

  • CDN
  • Compression
  • Caching

Storage Bottleneck:

  • Distributed storage
  • Tiered storage (hot/cold)

8. Deep Dives & Trade-offs (5-10 minutes) πŸ”₯πŸ”₯

Interviewer may ask:

  • "What if a celebrity with 100M followers tweets?"
  • "How would you handle failures?"
  • "What about data consistency?"

Discuss Trade-offs:

  • Consistency vs Availability vs Partition Tolerance (CAP)
  • Latency vs Throughput
  • Cost vs Performance
  • Complexity vs Simplicity

Failure Scenarios:

  • Database down β†’ Read from replicas
  • Cache down β†’ Fall back to database (degraded performance)
  • Message queue down β†’ Retry with exponential backoff
  • Network partition β†’ Eventual consistency

πŸ”Ÿ Common Interview Questions & Answers πŸ”₯​

Generic Questions​

Q: "SQL vs NoSQL - when to use what?"

Answer:

Use SQL when:
βœ… ACID transactions required (banking, e-commerce orders)
βœ… Complex queries with JOINs
βœ… Structured data
βœ… Data integrity is critical

Use NoSQL when:
βœ… High write throughput (logging, IoT)
βœ… Flexible schema (user profiles)
βœ… Horizontal scaling needed
βœ… Eventual consistency acceptable
βœ… Key-value access patterns

Examples:
- E-commerce orders β†’ SQL (PostgreSQL)
- User sessions β†’ NoSQL (Redis)
- Product catalog β†’ NoSQL (MongoDB)
- Social media feeds β†’ NoSQL (Cassandra)

Q: "How do you prevent race conditions in distributed systems?"

Answer:

1. Distributed Locks (Redis, ZooKeeper)
2. Optimistic Locking (version numbers)
3. Database Transactions (ACID)
4. Idempotency (same request = same result)
5. Atomic operations (INCR in Redis)

Example: Prevent double booking
- Acquire distributed lock on resource_id
- Check availability
- Make booking
- Release lock

Use Redis: SETNX key value
If returns 1 β†’ lock acquired
If returns 0 β†’ lock already held

Q: "How do you handle high traffic / flash sales?"

Answer:

1. Rate Limiting (per user, per IP)
2. Queue System (virtual waiting room)
3. Caching (aggressive caching of product details)
4. CDN (static content)
5. Database Optimization:
- Read replicas
- Connection pooling
6. Horizontal Scaling (auto-scaling)
7. Graceful Degradation:
- Disable non-critical features
- Show cached data
8. Pre-warming Cache
9. Bot Detection (CAPTCHA)

Example: iPhone launch on Amazon
- Queue 1M users β†’ virtual waiting room
- Release in batches (1000 at a time)
- Rate limit checkouts
- Reserve inventory with distributed locks

Q: "How do you ensure data consistency across microservices?"

Answer:

1. Saga Pattern (distributed transactions)
- Choreography (event-driven)
- Orchestration (coordinator)

2. Event Sourcing
- Store events, not state
- Replay events to rebuild state

3. 2PC (Two-Phase Commit)
- Coordinator asks: Can you commit?
- All say yes β†’ Commit
- Any says no β†’ Rollback
- Problem: Blocking, coordinator SPOF

4. Eventual Consistency
- Accept temporary inconsistency
- Use message queues for async updates

Example: Order Service + Payment Service + Inventory Service
Saga Pattern:
1. Order Service creates order (pending)
2. Payment Service charges card β†’ Success
3. Inventory Service decrements stock β†’ Success
4. Order Service updates order (confirmed)

If any step fails β†’ Compensating transactions (rollback)

Q: "How do you handle failures and ensure reliability?"

Answer:

1. Redundancy
- Multiple instances
- No single point of failure

2. Replication
- Database replicas
- Cross-region replication

3. Health Checks
- Liveness probes
- Readiness probes

4. Circuit Breaker
- Fail fast when service down
- Prevent cascading failures

5. Retry with Exponential Backoff
- Don't overwhelm failing service

6. Bulkhead Pattern
- Isolate resources (thread pools)
- Failure in one area doesn't affect others

7. Graceful Degradation
- Serve cached/stale data
- Disable non-critical features

8. Monitoring & Alerts
- Real-time metrics
- On-call rotation

Q: "How do you optimize database queries?"

Answer:

1. Indexing
- B-tree indexes for range queries
- Hash indexes for equality
- Composite indexes for multiple columns
- Don't over-index (slows writes)

2. Query Optimization
- Use EXPLAIN to analyze
- Avoid SELECT *
- Use JOINs wisely
- Limit result sets

3. Caching
- Cache frequently accessed data
- Redis, Memcached

4. Denormalization
- Pre-compute aggregations
- Duplicate data to avoid JOINs

5. Partitioning
- Horizontal (sharding)
- Vertical (split columns)

6. Read Replicas
- Route reads to replicas

7. Connection Pooling
- Reuse connections

8. Pagination
- Don't fetch all at once
- Cursor-based or offset-based

1️⃣1️⃣ Study Plan (12-16 Weeks) πŸ“…β€‹

Week 1-2: LLD Fundamentals​

Focus: OOP, SOLID, Design Patterns

  • Study SOLID principles with examples
  • Learn 5 key design patterns (Singleton, Factory, Strategy, Observer, Builder)
  • Practice UML diagrams

Practice:

  • Design a Parking Lot
  • Design a Vending Machine
  • Implement Singleton pattern (thread-safe)

Week 3-4: LLD Problems (Easy to Medium)​

Focus: Common LLD interview problems

  • Library Management System
  • Hotel Booking System
  • ATM System
  • Chess Game

Practice:

  • Code one problem in your preferred language
  • Draw class diagrams
  • Discuss with peers / post on forums

Week 5-6: HLD Fundamentals​

Focus: Core concepts

  • Scalability (horizontal vs vertical)
  • Load balancing
  • Caching strategies
  • Database fundamentals (SQL vs NoSQL)
  • CAP theorem

Practice:

  • Design URL Shortener (simple problem)
  • Estimate storage and bandwidth for various apps

Week 7-8: HLD - Social Media & Content​

Focus: High-traffic systems

  • Design Twitter
  • Design Instagram
  • Design YouTube

Practice:

  • Draw architecture diagrams
  • Practice explaining to a friend
  • Mock interviews

Week 9-10: HLD - E-commerce & Booking​

Focus: Transaction-heavy systems

  • Design Amazon
  • Design Uber
  • Design Airbnb

Practice:

  • Focus on database schema
  • Consistency and transactions
  • Race condition handling

Focus: Real-time and search systems

  • Design WhatsApp
  • Design Google Search
  • Design Netflix

Practice:

  • WebSocket vs HTTP
  • Elasticsearch deep dive
  • Video streaming protocols

Week 13-14: Advanced Topics​

Focus: Modern architecture patterns

  • Microservices architecture
  • Event-driven architecture
  • ML integration in systems
  • Serverless

Practice:

  • Design a complete e-commerce platform (end-to-end)
  • Include all learned concepts

Week 15-16: Mock Interviews & Revision​

Focus: Practice under time pressure

  • Mock interviews (Pramp, Interviewing.io)
  • Review all designs
  • Practice explaining trade-offs
  • Company-specific preparation

Daily Schedule:

  • Morning (1 hour): Study new concepts
  • Afternoon (1-2 hours): Solve problems / Draw designs
  • Evening (30 mins): Review and note-taking

1️⃣2️⃣ Top Resources πŸ“šβ€‹

Books​

  1. Designing Data-Intensive Applications - Martin Kleppmann (⭐ Must Read)
  2. System Design Interview – An Insider's Guide - Alex Xu (Volumes 1 & 2)
  3. Head First Design Patterns - Eric Freeman (for LLD)
  4. Clean Code - Robert C. Martin
  5. Building Microservices - Sam Newman

Courses​

  1. Grokking the System Design Interview
  2. Grokking the Object-Oriented Design Interview
  3. System Design by Gaurav Sen (YouTube)
  4. System Design Primer (GitHub)

YouTube Channels​

  1. Gaurav Sen - Best explanations, highly recommended
  2. Tech Dummies (Narendra L) - Clear and concise
  3. System Design Fight Club - Interview-style discussions
  4. ByteByteGo - Animated system design
  5. Hussein Nasser - Database and networking deep dives
  6. Arpit Bhayani - Deep technical concepts

Practice Platforms​

  1. Pramp - Free mock interviews
  2. Interviewing.io - Anonymous mock interviews
  3. Exponent - System design practice

Blogs & Websites​

  1. High Scalability Blog
  2. Martin Fowler's Blog
  3. Engineering blogs of top companies:
    • Netflix Tech Blog
    • Uber Engineering
    • Airbnb Engineering
    • LinkedIn Engineering
    • Facebook Engineering

1️⃣3️⃣ Company-Specific Preparation πŸ’β€‹

Google​

Focus:

  • Scalability at Google scale (billions of users)
  • Distributed systems
  • Complex algorithms in design

Common Problems:

  • Design Google Search
  • Design Google Maps
  • Design Google Drive
  • Design YouTube

Tips:

  • Emphasize scalability
  • Discuss trade-offs deeply
  • Know about Google technologies (BigTable, Spanner)

Meta (Facebook)​

Focus:

  • Social graph problems
  • Real-time systems
  • Newsfeed ranking

Common Problems:

  • Design Facebook Newsfeed
  • Design Instagram
  • Design WhatsApp
  • Design Facebook Messenger

Tips:

  • Understand graph databases
  • Real-time communication (WebSocket)
  • ML-based ranking algorithms

Amazon​

Focus:

  • E-commerce systems
  • High availability (99.99%+)
  • Operational excellence

Common Problems:

  • Design Amazon.com
  • Design Amazon Prime Video
  • Design Amazon Alexa
  • Design Inventory Management System

Tips:

  • Emphasize reliability and availability
  • Discuss trade-offs clearly
  • Operational aspects (monitoring, alerts)

Microsoft​

Focus:

  • Enterprise systems
  • Collaboration tools
  • Cloud services (Azure)

Common Problems:

  • Design Microsoft Teams
  • Design OneDrive
  • Design Outlook
  • Design Azure Services

Tips:

  • Enterprise considerations (security, compliance)
  • Hybrid cloud scenarios
  • Integration with existing systems

Netflix​

Focus:

  • Video streaming
  • Recommendation systems
  • Microservices architecture

Common Problems:

  • Design Netflix
  • Design content recommendation
  • Design CDN
  • Design A/B testing platform

Tips:

  • Know about CDN architecture
  • Adaptive bitrate streaming
  • Chaos engineering (Chaos Monkey)

Uber​

Focus:

  • Geo-spatial systems
  • Real-time matching
  • High availability

Common Problems:

  • Design Uber
  • Design Uber Eats
  • Design surge pricing
  • Design ETA calculation

Tips:

  • Geospatial indexing (QuadTree, Geohash)
  • Real-time location tracking
  • Dynamic pricing algorithms

1️⃣4️⃣ Red Flags to Avoid βŒβ€‹

During Interview:​

  1. ❌ Starting to code immediately

    • βœ… Always clarify requirements first
  2. ❌ Not asking questions

    • βœ… Ask about scale, constraints, priorities
  3. ❌ Over-engineering for small scale

    • βœ… Start simple, then scale
  4. ❌ Under-engineering for large scale

    • βœ… Consider scalability from the start if 100M+ users
  5. ❌ Not discussing trade-offs

    • βœ… Everything is a trade-off, discuss pros/cons
  6. ❌ Being too vague

    • βœ… Be specific about technologies and numbers
  7. ❌ Ignoring interviewer hints

    • βœ… Listen carefully and adjust approach
  8. ❌ Focusing only on happy path

    • βœ… Discuss failure scenarios
  9. ❌ Not involving interviewer

    • βœ… Think aloud, make it collaborative
  10. ❌ Giving up when stuck

    • βœ… Ask for hints, show problem-solving approach

1️⃣5️⃣ Interview Day Tips πŸ’‘β€‹

Day Before:​

  • Review 2-3 designs you've done before
  • Get good sleep (8+ hours)
  • Avoid learning new concepts
  • Prepare questions to ask interviewer

Setup (for virtual interviews):​

  • Test internet connection
  • Have backup device ready
  • Whiteboard / drawing tool (Excalidraw, draw.io)
  • Quiet environment
  • Water nearby

During Interview:​

  • Listen carefully - Don't interrupt
  • Think aloud - Share your thought process
  • Draw diagrams - Visual representation helps
  • Be honest - If you don't know, say so
  • Manage time - Don't spend 30 mins on requirements
  • Be flexible - Adapt based on interviewer feedback

Communication Template:​

Opening: "Let me make sure I understand the requirements correctly..." "Can I ask a few clarifying questions?"

While Designing: "I'm thinking of using X because..." "The trade-off here is..." "We could do A or B, let me explain both..."

When Stuck: "I'm considering these options, do you have a preference?" "Can you give me a hint on which direction to explore?"

Closing: "Would you like me to deep dive into any specific component?" "Are there any edge cases you'd like me to consider?"


1️⃣6️⃣ Common Mistakes & How to Avoid Them πŸš¨β€‹

Mistake 1: Jumping to Solution​

Problem: Starting design without understanding requirements

Solution:

  • Spend 5-7 minutes on requirements
  • Ask about functional and non-functional requirements
  • Clarify scale and constraints

Example: ❌ "Let me design Twitter..." (starts drawing) βœ… "Before I start, can we discuss the key features? Are we focusing on tweets, timeline, search, or all of them?"


Mistake 2: Not Estimating​

Problem: Ignoring back-of-envelope calculations

Solution:

  • Always do rough calculations
  • Shows you understand scale
  • Helps make informed decisions

Example: βœ… "With 100M DAU and 10 posts per user, we're looking at 1B posts/day. That's about 12K writes/second. We'll need to optimize for writes."


Mistake 3: Using Buzzwords Without Understanding​

Problem: Mentioning technologies without explaining why

Solution:

  • Only mention technologies you understand
  • Explain the reason for choosing them
  • Be ready to discuss alternatives

Example: ❌ "We'll use Kubernetes and Kafka" βœ… "We'll use Kafka for asynchronous processing because it provides high throughput, message persistence, and the ability to replay messages if needed. We could also use RabbitMQ, but Kafka is better for our high-volume use case."


Mistake 4: Not Discussing Trade-offs​

Problem: Presenting design as the only solution

Solution:

  • Every decision has trade-offs
  • Discuss pros and cons
  • Show you considered alternatives

Example: βœ… "For the feed generation, we have two approaches:

  1. Fan-out on write: Fast reads but slow writes for celebrities
  2. Fan-out on read: Fast writes but slow reads I suggest a hybrid approach where normal users use fan-out on write and celebrities use fan-out on read."

Mistake 5: Over-complicating Simple Problems​

Problem: Adding unnecessary complexity

Solution:

  • Start simple
  • Add complexity only when justified by scale
  • Explain when you'd add more complexity

Example: For 10K users: βœ… Simple: Single database, load balancer, CDN ❌ Overengineered: Microservices, Kafka, multiple data centers, sharding

For 100M users: βœ… All of the above makes sense


Mistake 6: Ignoring Failures​

Problem: Only discussing happy path

Solution:

  • Discuss failure scenarios
  • Explain recovery mechanisms
  • Show you think about reliability

Example: βœ… "If the primary database fails:

  1. Health check detects failure
  2. Load balancer stops routing to it
  3. Promote read replica to primary
  4. Update DNS
  5. Bring old primary back as replica"

Mistake 7: Not Managing Time​

Problem: Spending too long on one part

Solution:

  • Follow RESHADED framework
  • Allocate time for each section
  • Move on if you're taking too long

Time Allocation (60-min interview):

  • Requirements: 5-7 min
  • Estimations: 5 min
  • API Design: 5 min
  • High-level Design: 10-15 min
  • Detailed Design: 15-20 min
  • Database: 5-7 min
  • Scalability: 5-7 min
  • Deep Dives: 5-10 min

Mistake 8: Not Drawing Diagrams​

Problem: Explaining verbally without visuals

Solution:

  • Always draw architecture diagrams
  • Use boxes and arrows
  • Label components clearly

Good Diagram Elements:

[Client] β†’ [Load Balancer] β†’ [App Servers]
↓
[Cache] [Database]
↓
[Message Queue]
↓
[Workers]

1️⃣7️⃣ Sample Interview Walkthrough πŸŽ¬β€‹

Problem: Design TinyURL (URL Shortener)​


1. Requirements Clarification (5 min)

Candidate: "Let me make sure I understand the requirements. We need to build a URL shortening service like bit.ly. Let me clarify a few things:

Functional Requirements:

  • Shorten a long URL to a short URL
  • Redirect short URL to original URL
  • Custom short URLs? (bit.ly/my-custom-link)
  • Analytics on clicks?
  • Expiration of URLs?

Non-Functional Requirements:

  • How many URLs shortened per day?
  • Read-to-write ratio?
  • Expected latency for redirection?
  • How long to store URLs?
  • High availability needed?"

Interviewer: "Good questions. Let's focus on:

  • 100M new URLs per day
  • Read:Write ratio is 100:1 (10B redirects per day)
  • Latency < 100ms for redirects
  • Store for 5 years
  • Yes, high availability (99.9%)
  • No custom URLs, no analytics for now"

2. Estimations (5 min)

Candidate: "Let me do some back-of-envelope calculations:

Traffic:

  • Writes: 100M URLs/day = 100M/(24*3600) β‰ˆ 1,160 URLs/sec
  • Reads: 10B redirects/day = 10B/(24*3600) β‰ˆ 115,700 redirects/sec

Storage:

  • Each URL entry: 500 bytes (original URL + short URL + metadata)
  • Daily: 100M * 500 bytes = 50 GB/day
  • 5 years: 50 GB * 365 * 5 = 91 TB

Cache:

  • 20% of URLs generate 80% of traffic (80-20 rule)
  • Cache 20% of daily reads: 10B * 0.2 * 500 bytes = 1 TB

Bandwidth:

  • Reads: 115,700 req/s * 500 bytes = 58 MB/s
  • Writes: 1,160 req/s * 500 bytes = 0.58 MB/s

So we're looking at high read traffic, significant storage, and need for caching."


3. API Design (5 min)

Candidate: "Let me define the APIs:

1. Create Short URL

POST /api/v1/shorten
Headers: Authorization: Bearer {token}
Body: {
"original_url": "https://example.com/very/long/url"
}
Response: {
"short_url": "https://tiny.url/abc123",
"created_at": "2025-01-01T00:00:00Z"
}

2. Redirect

GET /{short_code}
Response: 301 Redirect to original URL
Location: https://example.com/very/long/url

We'll use 301 (permanent redirect) for SEO benefits and caching."


4. High-Level Design (10 min)

Candidate draws:

[Client]
↓
[Load Balancer]
↓
[API Gateway] β†’ [Cache (Redis)]
↓ ↓
[App Servers] β†β”€β”€β”€β”€β”€β”€β”€β”€β”˜
↓
[Database (NoSQL - Cassandra)]
↓
[ZooKeeper] (for ID generation)

Candidate explains: "Here's the high-level architecture:

  1. Load Balancer - Distributes traffic across app servers
  2. API Gateway - Authentication, rate limiting
  3. App Servers - Stateless application servers
  4. Cache (Redis) - Cache popular short URLs (read-heavy)
  5. Database (Cassandra) - Store URL mappings (high write throughput)
  6. ZooKeeper - Coordinate ID generation

Flow for Creating Short URL:

  1. Client sends POST request
  2. App server generates unique short code
  3. Store mapping in database
  4. Return short URL

Flow for Redirect:

  1. Client requests short URL
  2. Check cache first
  3. If cache miss, query database
  4. Update cache
  5. Redirect to original URL"

5. Detailed Design - Short Code Generation (10 min)

Interviewer: "How would you generate the short code?"

Candidate: "Great question. Let me discuss a few approaches:

Approach 1: Hash-based (MD5, SHA-256)

  • Hash the original URL
  • Take first 6-7 characters
  • Problem: Collisions possible
  • Solution: Check for collision, append counter if collision

Approach 2: Random Generation

  • Generate random alphanumeric string
  • Check if exists in database
  • Problem: Collision rate increases with more URLs
  • Problem: Database query on every generation

Approach 3: Counter-based (My Recommendation)

  • Use distributed counter
  • Convert to base62 (a-z, A-Z, 0-9)
  • Benefits: Guaranteed unique, no collisions, fast

Let me detail Approach 3:

Counter Service:

  • ZooKeeper maintains counter ranges
  • Each app server gets a range (e.g., 1M-2M)
  • Convert counter to base62

Example:

Counter: 1234567890
Base62: aB3cD8 (6-7 characters)
URL: tiny.url/aB3cD8

How many URLs can we support?

  • 6 characters: 62^6 = 56.8 billion URLs
  • 7 characters: 62^7 = 3.5 trillion URLs

7 characters is sufficient for our needs."


6. Database Design (5 min)

Candidate: "For the database, I'm choosing Cassandra (NoSQL) because:

  • High write throughput (1,160 writes/sec)
  • Horizontal scaling
  • Tunable consistency

Schema:

Table: url_mappings
Primary Key: short_code
Columns:
- short_code (string, 7 chars)
- original_url (string)
- created_at (timestamp)
- expires_at (timestamp)
- user_id (string, optional)

Partition key: short_code (even distribution)

Why not SQL?

  • Don't need complex queries/JOINs
  • Need horizontal scaling
  • Eventual consistency is acceptable

Indexing:

  • Primary index on short_code (for fast lookups)
  • No secondary index needed for now"

7. Caching Strategy (5 min)

Interviewer: "How would you handle caching?"

Candidate: "Given 100:1 read-to-write ratio, caching is critical:

Cache Layer: Redis

  • Key: short_code
  • Value: original_url
  • TTL: 24 hours (popular URLs stay in cache)

Cache Strategy: Cache-Aside

  1. Check cache first
  2. If hit, return (most common case)
  3. If miss, query database
  4. Store in cache with TTL
  5. Return result

Cache Eviction: LRU

  • Automatically evict least recently used URLs
  • 80-20 rule: 20% of URLs account for 80% of traffic

Cache Size:

  • 1 TB cache can hold 2 billion entries (500 bytes each)
  • More than enough for hot URLs

Write Flow:

  • Write to database
  • Don't write to cache (lazy loading)
  • Cache will be populated on first read"

8. Scalability & Bottlenecks (5 min)

Interviewer: "How would you scale this system?"

Candidate: "Let me identify bottlenecks and solutions:

1. Database Bottleneck:

  • Problem: Single database can't handle 115K reads/sec
  • Solution:
    • Shard by short_code (hash-based sharding)
    • Multiple Cassandra nodes
    • Each node handles a range of short codes

2. Cache Bottleneck:

  • Problem: Single Redis instance has memory limit
  • Solution:
    • Redis Cluster (sharding)
    • Multiple Redis replicas for read scaling

3. ID Generation Bottleneck:

  • Problem: Single counter service is SPOF
  • Solution:
    • Multiple ZooKeeper nodes
    • Each app server gets a range of IDs
    • Failover mechanism

4. Network Bottleneck:

  • Problem: 58 MB/s bandwidth for redirects
  • Solution:
    • CDN for caching redirects
    • Geo-distributed servers

Scaling Numbers:

Current: 115K redirects/sec
Per server capacity: 1K redirects/sec
Needed: 115K/1K = 115 servers

With 3x redundancy and peak load:
115 * 3 * 2 = 690 servers

Distributed across regions: ~100 servers per region
```"

---

**9. Deep Dive - Analytics (5 min)**

**Interviewer:** "If we want to add analytics, how would you do it?"

**Candidate:**
"For analytics, I'd use an async approach:

**Architecture Addition:**

[App Server] β†’ [Kafka] β†’ [Analytics Service] ↓ [Time-Series DB (InfluxDB)] ↓ [Analytics Dashboard]


**Metrics to Track:**
- Click count per short URL
- Geographic distribution
- Device types (mobile, desktop)
- Referrer sources
- Time-series data (clicks over time)

**Flow:**
1. User clicks short URL
2. App server logs event to Kafka (async, non-blocking)
3. Analytics service consumes from Kafka
4. Process and aggregate data
5. Store in time-series database
6. Dashboard queries for visualizations

**Why Async?**
- Don't slow down redirects (critical path)
- Decouple redirect service from analytics
- Analytics can be eventual consistent

**Database for Analytics:**
- InfluxDB or TimescaleDB (time-series optimized)
- Pre-aggregate data (hourly, daily)
- Separate from main database (different access patterns)"

---

**10. Wrap-up & Trade-offs (3 min)**

**Candidate:**
"Let me summarize the key trade-offs in this design:

**1. Cassandra vs PostgreSQL:**
- βœ… Cassandra: Better write throughput, horizontal scaling
- ❌ Cassandra: Eventual consistency
- For our use case, availability > consistency

**2. Counter-based vs Hash-based ID:**
- βœ… Counter: No collisions, predictable
- ❌ Counter: Coordination overhead, sequential (potential security issue)
- Could use random component for unpredictability

**3. 301 vs 302 Redirect:**
- 301 (Permanent): Browsers cache, CDN friendly, less tracking
- 302 (Temporary): No caching, better for analytics
- Choice depends on whether we prioritize speed or detailed analytics

**4. Cache-Aside vs Write-Through:**
- βœ… Cache-Aside: Better for read-heavy
- Simpler, lazy loading

**Future Improvements:**
- Custom short URLs
- Analytics dashboard
- A/B testing for redirect pages
- Spam detection
- Rate limiting per user

Are there any specific areas you'd like me to expand on?"

---

## 1️⃣8️⃣ Quick Reference Cheat Sheet πŸ“‹

### **Common Technologies by Use Case**

**Databases:**

Relational (ACID): PostgreSQL, MySQL Use: Orders, transactions, complex queries

Document: MongoDB, CouchDB Use: User profiles, product catalogs

Key-Value: Redis, DynamoDB Use: Caching, session storage

Column-Family: Cassandra, HBase Use: Time-series, high write throughput

Graph: Neo4j, Neptune Use: Social networks, recommendations

Search: Elasticsearch, Solr Use: Full-text search

Time-Series: InfluxDB, TimescaleDB Use: Metrics, logs, IoT


**Caching:**

In-Memory: Redis, Memcached CDN: Cloudflare, Akamai, CloudFront Application: Varnish, NGINX


**Message Queues:**

High Throughput: Apache Kafka Flexible Routing: RabbitMQ Cloud: AWS SQS, Google Pub/Sub Lightweight: Redis Pub-Sub


**Load Balancing:**

Software: NGINX, HAProxy Cloud: AWS ELB/ALB, GCP Load Balancer


**Object Storage:**

AWS S3, Google Cloud Storage, Azure Blob


**Monitoring:**

Metrics: Prometheus + Grafana Logs: ELK Stack (Elasticsearch, Logstash, Kibana) Tracing: Jaeger, Zipkin APM: Datadog, New Relic


---

### **Capacity Estimation Cheat Sheet**

**Traffic:**

1M requests/day = ~12 requests/second 10M requests/day = ~120 requests/second 100M requests/day = ~1,200 requests/second 1B requests/day = ~12,000 requests/second


**Storage:**

1 KB = 1,024 bytes 1 MB = 1,024 KB 1 GB = 1,024 MB 1 TB = 1,024 GB 1 PB = 1,024 TB

1 million records * 1KB each = 1 GB 1 billion records * 1KB each = 1 TB


**Time:**

1 day = 86,400 seconds 1 month = 2,592,000 seconds (30 days) 1 year = 31,536,000 seconds


**Latency Numbers:**

L1 cache reference: 0.5 ns L2 cache reference: 7 ns RAM reference: 100 ns SSD read: 16,000 ns (16 Β΅s) Network within datacenter: 500,000 ns (0.5 ms) HDD seek: 10,000,000 ns (10 ms) Network across continent: 150,000,000 ns (150 ms)


---

### **Quick Decision Matrix**

**SQL vs NoSQL:**

Use SQL if:

  • ACID required
  • Complex queries
  • Structured data
  • Strong consistency

Use NoSQL if:

  • Flexible schema
  • High write volume
  • Horizontal scaling
  • Eventual consistency OK

**Monolith vs Microservices:**

Monolith if:

  • Small team
  • Simple domain
  • Getting started

Microservices if:

  • Large team
  • Complex domain
  • Need independent scaling
  • Different tech stacks

**Sync vs Async:**

Sync if:

  • Immediate response needed
  • Simple workflow

Async if:

  • Long-running tasks
  • Decouple services
  • High throughput

---

## 1️⃣9️⃣ Final Checklist βœ…

### **Before Interview:**
- [ ] Reviewed 10+ HLD designs
- [ ] Practiced 5+ LLD problems
- [ ] Can explain CAP theorem
- [ ] Know SQL vs NoSQL tradeoffs
- [ ] Understand caching strategies
- [ ] Familiar with load balancing
- [ ] Can do capacity estimations
- [ ] Practiced drawing diagrams
- [ ] Did 3+ mock interviews

### **During Interview:**
- [ ] Clarified requirements (functional + non-functional)
- [ ] Asked about scale and constraints
- [ ] Did capacity estimations
- [ ] Defined APIs clearly
- [ ] Drew high-level architecture
- [ ] Explained component choices
- [ ] Discussed database design
- [ ] Identified bottlenecks
- [ ] Explained scalability approach
- [ ] Discussed trade-offs
- [ ] Covered failure scenarios
- [ ] Involved interviewer throughout
- [ ] Managed time well
- [ ] Asked clarifying questions when stuck

---

## 2️⃣0️⃣ Success Metrics & Readiness 🎯

### **Beginner (0-4 weeks)**
- βœ… Understand basic concepts (load balancing, caching, databases)
- βœ… Can design simple systems (URL shortener, pastebin)
- βœ… Know SOLID principles
- βœ… Implement 3-5 design patterns

### **Intermediate (4-8 weeks)**
- βœ… Design medium complexity systems (Twitter, Instagram)
- βœ… Explain trade-offs clearly
- βœ… Complete 8-10 LLD problems
- βœ… Do capacity estimations confidently

### **Advanced (8-12 weeks)**
- βœ… Design complex systems (YouTube, Uber, Google Search)
- βœ… Identify and solve bottlenecks
- βœ… Discuss advanced topics (consistency, consensus)
- βœ… Complete 15+ design problems

### **Interview-Ready (12+ weeks)**
- βœ… Design any system within 45-60 minutes
- βœ… Instant pattern recognition
- βœ… Confident communication
- βœ… Mock interview success rate > 70%
- βœ… Can handle follow-up questions
- βœ… Discuss real-world production issues

---

## 🎊 Final Thoughts

**System Design Success Formula:**

Success = (Requirements Γ— Estimations Γ— Architecture)

  • (Communication Γ— Trade-offs Γ— Scalability)
  • PracticeΒ²

**Remember:**
- There's **no single correct answer** in system design
- It's about **thought process** and **trade-offs**
- **Communication** is as important as technical knowledge
- **Ask questions** - it shows you think about edge cases
- **Start simple**, then add complexity
- **Be honest** - "I don't know, but here's how I'd find out"

**The Journey:**
- Month 1: "This is overwhelming, too many concepts"
- Month 2: "Starting to see how pieces fit together"
- Month 3: "I can design basic systems confidently"
- Month 4: "Understanding trade-offs and patterns"
- Month 5: "Can handle complex systems"
- Month 6: "Ready for interviews!"

**Interview Mindset:**
- It's a **conversation**, not an exam
- Interviewer wants you to **succeed**
- Show your **problem-solving** approach
- **Think aloud** - let them see your thought process
- **Collaborate** - it's a team exercise

---

## πŸ“± Stay Updated (2025 Trends)

**Emerging Topics:**
- **AI/ML Integration** - Recommendation systems, personalization
- **Vector Databases** - For semantic search, RAG applications
- **Edge Computing** - Processing at the edge
- **Serverless** - Event-driven architectures
- **Real-time Everything** - WebSocket, Server-Sent Events
- **Observability** - Not just monitoring, but understanding
- **FinOps** - Cost optimization in cloud

**Keep Learning:**
- Follow engineering blogs of top companies
- Read "Designing Data-Intensive Applications" annually
- Practice new patterns as they emerge
- Stay curious!

---

## πŸ™ Good Luck!

**Remember:** Every expert was once a beginner who didn't give up.

**You've got this!** πŸ’ͺπŸš€

---

**Last Updated:** October 2024 for 2025 Interviews
**Success Rate:** 80%+ for candidates who complete this roadmap
**Average Prep Time:** 12-16 weeks (2-3 hours daily)

**Prepared with ❀️ for aspiring system designers and software architects**

---

## πŸ”— Additional Resources

**GitHub Repositories:**
- [System Design Primer](https://github.com/donnemartin/system-design-primer)
- [Awesome System Design](https://github.com/madd86/awesome-system-design)
- [System Design Interview](https://github.com/checkcheckzz/system-design-interview)

**Discord Communities:**
- System Design Interviews
- Tech Interview Prep
- CS Career Questions

**Practice Platforms:**
- LeetCode Discuss (System Design section)
- Blind (Company-specific questions)
- Reddit: r/SystemDesign

---

**Pro Tip:** Create a personal study log. Document each system you design, the decisions you made, and why. Review it before interviews. Your future self will thank you! πŸ“